Retry storms occur when a function fails and is automatically retried repeatedly due to default retry behavior for asynchronous events (e.g., SQS, EventBridge). If the error is persistent and unhandled, retries can accumulate rapidly — often invisibly — creating a large volume of billable executions with no successful outcome. This is especially costly when functions run for extended durations or have high memory allocation.
* Each retry counts as a separate request * Billed per millisecond of execution time for each retry * Retries from failed invocations can exponentially increase cost if not controlled
https://docs.aws.amazon.com/lambda/latest/dg/invocation-retries.html
https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html\#invocation-dlq
https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventsourcemapping.html\#invocation-eventsourcemapping-retries
https://aws.amazon.com/lambda/pricing/