Athena generates a new S3 object for every query result, regardless of whether the output is needed long term. Over time, this leads to uncontrolled growth of the output bucket, especially in environments with repetitive queries such as cost and usage reporting. Many of these files are transient and provide little value once the query is consumed. Without lifecycle rules, organizations pay for unnecessary storage and create clutter in S3.
Storing raw JSON or CSV files in S3—especially when written frequently in small batches—leads to excessive scan costs in Athena. These formats are row-based and verbose, requiring Athena to scan and parse the full content even when only a few fields are queried. Without columnar formats, partitioning, or metadata-aware table formats, queries become inefficient and expensive, especially in high-volume environments.