Best GPU hosting for fine-tuning Llama 2 70B?
Looking to fine-tune Llama 2 70B on our internal dataset (~50GB). Currently evaluating:
- Lambda Labs: $3.5/hr for 8x H100s, but slow provisioning
- Vultr Cloud GPU: Better pricing (~$2.8/hr) but limited A100 availability
- OVH Bare Metal: Considering their GPU-enabled dedicated boxes for cost consistency
Needing ~200-300 hours of training. Anyone have recent experience with these? Concerned about:
- GPU memory bandwidth for distributed training
- Network latency between nodes
- Data ingestion bottlenecks
Also open to other providers. Cost vs performance is the main trade-off here.
Edited at 25 Mar 2026, 23:22
H100s are overkill for 70B fine-tuning tbh—A100s will get you there faster per dollar. That said, Lambda's provisioning lag is brutal; I'd skip them.
Here's the thing: OVH bare metal sounds good on paper but you're stuck with whatever GPU config they have. Vultr's cheaper, but check their inter-node latency first—NVLink is your friend for distributed training, and not all cloud providers have it set up well. Have you looked at CoreWeave? Similar pricing to Vultr but their GPU clustering is more optimized for this exact use case.
For 200-300 hours, bandwidth matters more than pure FLOPS. Run a test job first—don't commit the full 300 hours cold.
Good point about the H100 overkill—yeah, I was thinking the same thing. OVH bare metal does seem like the better long-term play for the cost. Have you actually run 70B fine-tuning on A100s before? Curious about wall-clock time vs the on-demand options.