Submit feedback on
Excessive Lambda Retries (Retry Storms)
We've received your feedback.
Thanks for reaching out!
Oops! Something went wrong while submitting the form.
Close
Excessive Lambda Retries (Retry Storms)
Liam Greenamyre
Service Category
Compute
Cloud Provider
AWS
Service Name
AWS Lambda
Inefficiency Type
Retry Misconfiguration
Explanation

Retry storms occur when a function fails and is automatically retried repeatedly due to default retry behavior for asynchronous events (e.g., SQS, EventBridge). If the error is persistent and unhandled, retries can accumulate rapidly — often invisibly — creating a large volume of billable executions with no successful outcome. This is especially costly when functions run for extended durations or have high memory allocation.

Relevant Billing Model

* Each retry counts as a separate request * Billed per millisecond of execution time for each retry * Retries from failed invocations can exponentially increase cost if not controlled

Detection
  • Check for high error-to-success ratios in CloudWatch Metrics
  • Investigate spikes in invocation count after initial failure events
  • Correlate retries with common payloads and error messages
  • Review configurations for MaximumRetryAttempts and DLQs
  • Assess the presence of throttling, downstream timeouts, or missing exception handling
Remediation
  • Configure DLQs to isolate and inspect failed invocations
  • Implement exponential backoff or circuit breaker patterns in retry logic
  • Set appropriate retry limits on event source mappings
  • Improve exception handling and idempotency in function code
  • Utilize anomaly detection software to alert on abnormal retry patterns
  • Monitor CloudWatch for retry-specific metrics and alarms
Submit Feedback