Submit feedback on
Stale Completed or Failed Fargate Pods Causing Direct Billing and Capacity Waste
We've received your feedback.
Thanks for reaching out!
Oops! Something went wrong while submitting the form.
Close
Stale Completed or Failed Fargate Pods Causing Direct Billing and Capacity Waste
Tai Nguyen
CER:
AWS-Compute-9638
Service Category
Compute
Cloud Provider
AWS
Service Name
Amazon EKS
Inefficiency Type
Unnecessary compute and networking charges
Explanation

This inefficiency occurs when Kubernetes Jobs or CronJobs running on EKS Fargate leave completed or failed pod objects in the cluster indefinitely. Although the workload execution has finished, AWS keeps the underlying Fargate microVM running to allow log inspection and final status checks. As a result, vCPU, memory, and networking resources remain allocated and billable until the pod object is explicitly deleted.

Over time, large numbers of stale Job pods can generate direct compute charges as well as consume ENIs and IP addresses, leading to both unnecessary spend and capacity pressure. This pattern is common in batch-processing and scheduled workloads that lack automated cleanup.

Relevant Billing Model

On EKS Fargate, billing for vCPU and memory continues as long as the pod object exists, even after a Job pod reaches a Completed (Succeeded or Failed) state. Fargate infrastructure is only released—and billing stops—when the pod object is deleted from the Kubernetes API server.

Detection
  • Review whether completed or failed Fargate pods persist long after Jobs finish
  • Assess whether batch or CronJob workloads accumulate large numbers of historical pods
  • Identify environments where Job execution completes but pod cleanup is not automated
Remediation
  • Enable automatic cleanup of finished Jobs using TTL-after-finished policies
  • Configure CronJobs to retain minimal successful and failed job history
  • Treat pod lifecycle cleanup as a required design consideration for Fargate-based batch workloads
Submit Feedback