Cold starts killing our API response times, need advice
So we migrated a bunch of microservices to AWS Lambda last month and honestly it's been... rough. Cold starts are absolutely brutal. We're seeing like 5-8s latency on the first invocation after idle periods, which is completely unacceptable for our API.
We're using Node.js 20 with some heavier dependencies. Already tried:
- Lambda Layers to reduce package size
- Provisioned Concurrency (way too expensive at scale)
- Keeping a CloudWatch Event ping them every 5 mins (hacky as hell)
Anyone else dealing with this? Wondering if we should just bite the bullet and go back to containers on ECS or Fargate. Or is there something obvious I'm missing with Lambda optimization?
Thanks!
Edited at 26 Mar 2026, 13:46
Have you considered switching to Google Cloud Run instead? It handles cold starts way better than Lambda—no request queuing during startup, and their container model means you can optimize the image itself. We ditched Lambda for exactly this reason and haven't looked back.
If you're locked into AWS though, look into SnapStart for Java (not Node sadly), or bite the bullet on Provisioned Concurrency but architect it smarter—maybe only keep 1-2 instances warm for your critical endpoints and let less-used services cold start. The CloudWatch ping is just masking the problem. https://docs.aws.amazon.com/lambda/
Yeah, I've heard good things about Cloud Run's cold start handling. We're pretty locked into AWS right now (whole infrastructure is there), but might be worth revisiting that conversation with the team. The container model does sound appealing though.
Have you tried bundling with esbuild instead of webpack? We cut our cold starts from ~4s to ~1.2s just by switching bundlers and tree-shaking aggressively. Also, move heavy deps (like AWS SDK) to Layers separately—only require them in the handler code path, not at module load time. Node.js 20 is solid but make sure you're not loading dependencies you don't actually call during init.
Provisioned Concurrency at scale is brutal, yeah. Have you priced out just moving these to ECS on Fargate instead? Might actually be cheaper than the concurrency costs.
Have you looked at Lambda SnapStart? It's designed exactly for this—snapshots the initialized JVM state so you skip the whole initialization phase. Should cut your Node cold starts significantly. Fair warning though, it only works with specific runtimes and there are some gotchas around stateful resources, but worth testing in a non-prod env first: https://docs.aws.amazon.com/lambda/