Comparing Vultr vs Hetzner for Llama 2 70B inference
I'm planning to deploy Llama 2 70B for a small SaaS product and need advice on which provider offers better bang-for-buck.
Current options:
- Vultr: RTX 6000 Ada ($4.50/hr), ~260 tokens/sec on fp16
- Hetzner: RTX 4090 ($2.80/hr), ~180 tokens/sec
Hetzner is ~40% cheaper but Vultr's newer GPU might handle batching better for concurrent requests. Has anyone deployed on both? What's your latency experience at scale?
Edited at 25 Mar 2026, 23:52
Vultr's cheaper per-token tbh—do the math: $4.50/hr ÷ 260 tokens/sec ≈ $0.0000048/token vs Hetzner's $4.44/token. But don't sleep on Hetzner's stability and network—I've had better cache behavior with their CPUs on batched requests even with slower GPUs. If you're doing <5 concurrent users, Hetzner wins. Above that, the Ada's NVLINK helps throughput more than raw clock speed.
Good point on the per-token math, I hadn't actually calculated it out like that. The stability angle is interesting though—did you end up sticking with Hetzner long-term, or did the throughput difference matter more in practice? I'm leaning toward testing both with our actual workload first.