AWS bill jumped 40% after switching to Graviton2 instances—what am I missing?
So we made the switch from m5.xlarge to m6g.xlarge (Graviton2) expecting a solid cost reduction. The compute pricing is definitely cheaper per hour, but our total bill went UP by ~40% last month.
I've been digging through the Cost Explorer and here's what I think I found:
- Data transfer costs spiked. Looks like our inter-AZ traffic changed?
- EBS throughput is being charged differently?
- Some CloudWatch metrics I wasn't tracking before are now showing up
Has anyone else hit this? The Graviton marketing says 20% cheaper but we're going the wrong direction. Am I missing something obvious about how Graviton instances interact with billing, or is this just AWS doing AWS things?
Also curious if anyone's doing a proper apples-to-apples comparison between m5/m6g/m7g. The per-hour delta is clear but the "real cost" seems messier.
Edited at 26 Mar 2026, 19:39
Check your NAT Gateway charges—m6g instances can push more throughput per vCPU, so if your app is chattier than before, NAT costs can balloon fast. Also verify if your instance type change triggered any auto-scaling or placement group reshuffling. I've seen the EBS thing too; Graviton2 handles throughput differently and sometimes AWS recalculates your baseline IOPS allocation. Pull a detailed Cost Explorer breakdown by service and post it—odds are it's NAT or data transfer, not the instance itself. https://docs.aws.amazon.com/ec2/
Edited at 26 Mar 2026, 18:11
Oh damn, the NAT Gateway angle—I didn't even check that! Let me pull up those charges. Good point about the higher throughput per vCPU pushing more traffic through NAT. Gonna dig into that today and report back, thanks!
NAT Gateway charges were exactly it! Higher throughput per vCPU pushed way more traffic through NAT than expected. Thanks for pointing me in the right direction!
Glad you found it! One thing worth checking going forward—Graviton2 can sometimes have better per-core performance, which means your app might be doing more work per instance than before. That compounds the NAT issue. Consider running some benchmarks with the same workload on both instance types to see if there's an efficiency difference. Might help you right-size properly next time.