Infrequently Accessed Data Stored in Azure Cosmos DB
Chibueze Eke
Service Category
Database
Cloud Provider
Azure
Service Name
Azure Cosmos DB
Inefficiency Type
Inefficient Storage Tiering
Explanation

Azure Cosmos DB is optimized for low-latency, globally distributed workloads—not long-term storage of infrequently accessed data. Yet in many environments, cold data such as logs, telemetry, or historical records is retained in Cosmos DB due to a lack of lifecycle management.

Relevant Billing Model

Cosmos DB charges include:

  • Data storage billed per GB per month, across all regions where data is replicated
  • Provisioned throughput (RU/s) or autoscale, billed regardless of actual usage
  • Backup storage billed separately (if configured)
  • Serverless mode available, but subject to limitations on scaling and total RU consumption

Storing cold data incurs persistent storage charges and may inflate RU requirements unnecessarily.

Detection
  • Identify Cosmos DB resources (e.g., containers, collections, or tables) with high storage usage but low request volume
  • Use Azure Monitor, diagnostic logs, or Workload Insights to assess read/write activity over time
  • Evaluate whether the data is actively queried or needed for operational workloads
  • Check whether data retention or lifecycle policies are defined for the resource
Remediation
  • Export infrequently accessed data to lower-cost storage services:
  • Use Blob Storage Cool for rarely accessed but readily retrievable data
  • Use Blob Storage Archive for long-term retention with delayed retrieval
  • Use Azure Table Storage for simple key/value access when global distribution is unnecessary
  • Delete cold data from Cosmos DB after successful archival
  • Implement data lifecycle automation to routinely transition stale data out of Cosmos DB