HostingArtisan Community for Web Artisans
LLM Deployment & Model Hosting

Comparing Vultr vs Hetzner for Llama 2 70B inference

2 replies · 2 views
#1 — Original Post
25 Mar 2026, 22:35
T
tensor_host

I'm planning to deploy Llama 2 70B for a small SaaS product and need advice on which provider offers better bang-for-buck.

Current options:

  • Vultr: RTX 6000 Ada ($4.50/hr), ~260 tokens/sec on fp16
  • Hetzner: RTX 4090 ($2.80/hr), ~180 tokens/sec

Hetzner is ~40% cheaper but Vultr's newer GPU might handle batching better for concurrent requests. Has anyone deployed on both? What's your latency experience at scale?

Edited at 25 Mar 2026, 23:52

#2
25 Mar 2026, 23:05
F
finops_pro

Vultr's cheaper per-token tbh—do the math: $4.50/hr ÷ 260 tokens/sec ≈ $0.0000048/token vs Hetzner's $4.44/token. But don't sleep on Hetzner's stability and network—I've had better cache behavior with their CPUs on batched requests even with slower GPUs. If you're doing <5 concurrent users, Hetzner wins. Above that, the Ada's NVLINK helps throughput more than raw clock speed.

#3
25 Mar 2026, 23:25
T
tensor_host

Good point on the per-token math, I hadn't actually calculated it out like that. The stability angle is interesting though—did you end up sticking with Hetzner long-term, or did the throughput difference matter more in practice? I'm leaning toward testing both with our actual workload first.

You need to be logged in to reply.

Log in to Reply

Cookie Preferences

We use cookies to improve your experience and analyse traffic. You can accept all or use only essential cookies.

Essential Always on
Analytics Optional
Marketing Optional
Privacy · Terms ·