Many organizations mistakenly believe that all AWS Marketplace spend automatically contributes to their EDP commitment. In reality, only certain Marketplace transactions, those involving EDP-eligible vendors and transactable SKUs, will count towards a portion of their EDP commitment. This misunderstanding can lead to double counting: forecasting based on the assumption that both native AWS usage and Marketplace purchases will fully draw down the commitment. If the assumptions are incorrect, the organization risks failing to meet its EDP threshold, incurring penalties or losing expected discounts.
Organizations frequently inherit continuous recording by default (e.g., through landing zones) without validating the business need for per-change granularity across all resource types and environments. In change-heavy accounts (ephemeral resources, CI/CD churn, autoscaling), continuous mode drives very high CIR volumes with limited additional operational value. Selecting periodic recording for lower-risk resource types and/or non-production environments can maintain necessary visibility while reducing CIR volume and cost. Recorder settings are account/region scoped, so you can apply continuous in production where required and periodic elsewhere.
By default, AWS Config can be set to record changes across all supported resource types, including those that change frequently, such as security group rules, IAM role policies, route tables, or network interfaces frequent ephemeral resources in containerized or auto-scaling setupsThese high-churn resources can generate an outsized number of configuration items and inflate costs — especially in dynamic or large-scale environments.
This inefficiency arises when recording is enabled indiscriminately across all resources without evaluating whether the data is necessary. Without targeted scoping, teams may incur large charges for configuration data that provides minimal value, especially in non-production environments.This can also obscure meaningful compliance signals by introducing noise
VPC Flow Logs configured with the ALL filter and delivered to CloudWatch Logs often result in unnecessarily high log ingestion volumes — especially in high-traffic environments. This setup is rarely required for day-to-day monitoring or security use cases but is commonly enabled by default or for temporary debugging and then left in place. As a result, teams incur excessive CloudWatch charges without realizing the logging configuration is misaligned with actual needs.
Teams often overuse Microsoft-hosted agents by running redundant or low-value jobs, failing to configure pipelines efficiently, or neglecting to use self-hosted agents for steady workloads. These inefficiencies result in unnecessary cost and delivery friction, especially when pipelines create queues due to limited agent availability.
By default, all Log Analytics tables are created under the Analytics plan, which is optimized for high-performance querying and interactive analysis. However, not all telemetry requires real-time access or frequent querying. Some tables may serve audit, archival, or compliance use cases where querying is rare or unnecessary. Leaving such tables on the Analytics plan results in unnecessary spend—especially when ingestion volumes are high or the table receives data from verbose sources (e.g., diagnostic logs, platform metrics).
Azure now allows users to assign different pricing plans at the table level, including the Basic plan, which offers significantly lower ingestion costs at the expense of reduced query functionality. This provides a valuable opportunity to align cost with access patterns by assigning less expensive plans to tables that are retained for record-keeping or compliance, rather than analysis.
While high-frequency alerting is sometimes justified for production SLAs, it's often overused across non-critical alerts or replicated blindly across environments. Projects with multiple environments (e.g., dev, QA, staging, prod) often duplicate alert rules without adjusting for business impact, which can lead to alert sprawl and inflated monitoring costs.
In large-scale environments, reducing the frequency of non-critical alerts—especially in lower environments—can yield significant savings. Teams often overlook this lever because alert configuration is considered part of operational hygiene rather than cost control. Tuning alert frequencies based on SLA requirements and actual urgency is a low-friction optimization opportunity that does not compromise observability when implemented thoughtfully.
By default, EventBridge includes retry mechanisms for delivery failures, particularly when targets like Lambda functions or Step Functions fail to process an event. However, if these retry policies are disabled or misconfigured, EventBridge may treat failed deliveries as successful, prompting upstream services to republish the same event multiple times in response to undelivered outcomes. This leads to: * Duplicate event publishing and delivery * Unnecessary compute triggered by repeated events * Increased EventBridge, downstream service, and data transfer costs This behavior is especially problematic in systems where idempotency is not strictly enforced and retries are managed externally by upstream services.
By default, CloudWatch Log Groups use the Standard log class, which applies higher rates for both ingestion and storage. AWS also offers an Infrequent Access (IA) log class designed for logs that are rarely queried — such as audit trails, debugging output, or compliance records. Many teams assume storage is the dominant cost driver in CloudWatch, but in high-volume environments, ingestion costs can account for the majority of spend. When logs that are infrequently accessed are ingested into the Standard class, it leads to unnecessary costs without impacting observability. The IA log class offers significantly reduced rates for ingestion and storage, making it a better fit for logs used primarily for post-incident review, compliance retention, or ad hoc forensic analysis.
By default, Cloud Logging retains logs for 30 days. However, many organizations increase retention to 90 days, 365 days, or longer — even for non-critical logs such as debug-level messages, transient system logs, or audit logs in dev environments. This extended retention can lead to unnecessary costs, especially when: * Logs are never queried after the first few days * Observability tooling duplicates logs elsewhere (e.g., SIEM platforms) * Retention settings are applied globally without considering log type or project criticality