Llama 2 70B inference costs on Vultr vs AWS vs Paperspace
Been testing Llama 2 70B deployments across three providers and the cost differences are wild.
Setup: vLLM + A100 80GB, 100 req/min load test
- Vultr GPU Cloud: $3.50/hr per A100 = ~$2,520/month. Pretty stable latency (85ms p95)
- AWS EC2 (g5.24xlarge): $12.48/hr = $8,985/month. Burst performance but expensive
- Paperspace: $0.98/hr = ~$705/month (!). Latency occasionally spikes to 150ms+ though
Paperspace is a no-brainer for cost, but Vultr's consistency is worth the extra $1.8K for production. Anyone running larger models in production? How are you handling the cost-latency tradeoff?
Also considering Runpod spot pricing but haven't tested it yet.
Edited at 26 Mar 2026, 01:50
Have you tested with quantization? Running Llama 2 70B at 4-bit on Paperspace could get you under 100ms latency while keeping costs dirt cheap. Also worth checking if vLLM's paged attention is properly configured—sometimes latency spikes come from suboptimal KV cache management. https://docs.vllm.ai/ has some good tuning guides for this exact scenario.
Oh good point! Haven't tried 4-bit quantization yet—been running full precision. I'll spin up a test on Paperspace with that and see if it tightens up the p95 latency. If that gets me under 100ms consistently, Paperspace becomes the obvious choice. Cheers!
Paperspace's pricing is insane but yeah, the noisy neighbor problem is real there. Vultr's consistency wins if you're serving customers, not just experimenting.
Paperspace's noisy neighbor issue killed it for us in production—went back to Vultr even at 3x the cost. Quantization helps but doesn't fix the inconsistency problem.