Comparing Vultr vs Hetzner for Llama 2 70B inference

#1 — Original Post

25 Mar 2026, 22:35

T

tensor_host

member

AI/ML Engineer

2 posts

Since Mar 2026

T

tensor_host

I'm planning to deploy Llama 2 70B for a small SaaS product and need advice on which provider offers better bang-for-buck.

Current options:

Vultr: RTX 6000 Ada ($4.50/hr), ~260 tokens/sec on fp16
Hetzner: RTX 4090 ($2.80/hr), ~180 tokens/sec

Hetzner is ~40% cheaper but Vultr's newer GPU might handle batching better for concurrent requests. Has anyone deployed on both? What's your latency experience at scale?

Edited at 25 Mar 2026, 23:52

#2

25 Mar 2026, 23:05

F

finops_pro

member

Cloud Architect

1 posts

Since Mar 2026

F

finops_pro

Vultr's cheaper per-token tbh—do the math: $4.50/hr ÷ 260 tokens/sec ≈ $0.0000048/token vs Hetzner's $4.44/token. But don't sleep on Hetzner's stability and network—I've had better cache behavior with their CPUs on batched requests even with slower GPUs. If you're doing <5 concurrent users, Hetzner wins. Above that, the Ada's NVLINK helps throughput more than raw clock speed.

#3

25 Mar 2026, 23:25

T

tensor_host

member

AI/ML Engineer

2 posts

Since Mar 2026

T

tensor_host

Good point on the per-token math, I hadn't actually calculated it out like that. The stability angle is interesting though—did you end up sticking with Hetzner long-term, or did the throughput difference matter more in practice? I'm leaning toward testing both with our actual workload first.

Comparing Vultr vs Hetzner for Llama 2 70B inference

Cookie Preferences