Storing raw JSON or CSV files in S3—especially when written frequently in small batches—leads to excessive scan costs in Athena. These formats are row-based and verbose, requiring Athena to scan and parse the full content even when only a few fields are queried. Without columnar formats, partitioning, or metadata-aware table formats, queries become inefficient and expensive, especially in high-volume environments.
Pay-per-scan — Athena charges based on the amount of data scanned per query, not the size of the result set. This makes storage format, partitioning, and file layout critical cost drivers.