HostingArtisan Community for Web Artisans
LLM Deployment & Model Hosting

Llama 2 70B inference costs on Vultr vs AWS vs Paperspace

4 replies · 5 views
#1 — Original Post
26 Mar 2026, 00:00
I
inference_api

Been testing Llama 2 70B deployments across three providers and the cost differences are wild.

Setup: vLLM + A100 80GB, 100 req/min load test

  • Vultr GPU Cloud: $3.50/hr per A100 = ~$2,520/month. Pretty stable latency (85ms p95)
  • AWS EC2 (g5.24xlarge): $12.48/hr = $8,985/month. Burst performance but expensive
  • Paperspace: $0.98/hr = ~$705/month (!). Latency occasionally spikes to 150ms+ though

Paperspace is a no-brainer for cost, but Vultr's consistency is worth the extra $1.8K for production. Anyone running larger models in production? How are you handling the cost-latency tradeoff?

Also considering Runpod spot pricing but haven't tested it yet.

Edited at 26 Mar 2026, 01:50

#2
26 Mar 2026, 00:05
P
pipe_grep

Have you tested with quantization? Running Llama 2 70B at 4-bit on Paperspace could get you under 100ms latency while keeping costs dirt cheap. Also worth checking if vLLM's paged attention is properly configured—sometimes latency spikes come from suboptimal KV cache management. https://docs.vllm.ai/ has some good tuning guides for this exact scenario.

#3
26 Mar 2026, 00:15
I
inference_api

Oh good point! Haven't tried 4-bit quantization yet—been running full precision. I'll spin up a test on Paperspace with that and see if it tightens up the p95 latency. If that gets me under 100ms consistently, Paperspace becomes the obvious choice. Cheers!

#4
26 Mar 2026, 00:30
C
compose_up

Paperspace's pricing is insane but yeah, the noisy neighbor problem is real there. Vultr's consistency wins if you're serving customers, not just experimenting.

#5
26 Mar 2026, 01:50
M
mlops_guy

Paperspace's noisy neighbor issue killed it for us in production—went back to Vultr even at 3x the cost. Quantization helps but doesn't fix the inconsistency problem.

You need to be logged in to reply.

Log in to Reply

Cookie Preferences

We use cookies to improve your experience and analyse traffic. You can accept all or use only essential cookies.

Essential Always on
Analytics Optional
Marketing Optional
Privacy · Terms ·