Amazon Bedrock Provisioned Throughput allows teams to reserve dedicated inference capacity for foundation models by purchasing model units with hourly billing under a commitment term. This capacity is billed continuously — whether or not any tokens are actually processed — making it a fixed cost that only pays off when sustained, high-volume token consumption justifies the premium over on-demand pricing. In practice, teams frequently purchase Provisioned Throughput to avoid on-demand throttling limits, but actual usage often falls well below the committed capacity, resulting in significant overspend compared to what on-demand pricing would have cost for the same workload.
The waste is compounded by the fact that Provisioned Throughput commitments cannot be canceled before the term expires — billing continues hourly until the commitment period ends. This means a team that overestimates its inference needs at the time of purchase is locked into paying for unused capacity for the full duration. The problem is especially common in early-stage AI deployments where usage patterns are not yet well understood, or in workloads with variable or unpredictable token volumes that are poorly suited to fixed-capacity reservations.
The cost impact can be substantial. A single model unit for even a moderately priced model can cost tens of thousands of dollars per month, and if actual token consumption would have cost only a fraction of that amount under on-demand pricing, the difference represents pure waste. Organizations running multiple Provisioned Throughput reservations across different models or environments can multiply this inefficiency significantly.
When organizations purchase third-party software through AWS Marketplace using annual subscriptions, they typically receive meaningful discounts compared to hourly pay-as-you-go (PAYG) pricing. However, when these annual subscriptions expire without active renewal, billing automatically reverts to the default hourly PAYG rate — which can be substantially higher. This is not a renewal at a higher rate; it is the absence of a renewal action that causes the subscription to lapse and the costlier pricing tier to take effect. Because the subscription simply expires silently, many teams do not realize they have lost their discounted rate until the cost increase appears in the next billing cycle.
This inefficiency is especially difficult to manage in enterprise environments where multiple Marketplace subscriptions are purchased at irregular intervals throughout the year, each with its own expiration date. Private offers — which provide custom-negotiated pricing — add further complexity because they cannot auto-renew by design; when a private offer expires, the customer either moves to the product's higher public pricing or loses the subscription entirely. The financial impact can be severe: in some cases, the licensing cost at PAYG rates can exceed the cost of the underlying compute infrastructure itself, as commonly seen with enterprise software such as SUSE Linux for SAP workloads.
Additionally, for AMI-based products, annual subscriptions are tied to specific instance types. Changing instance types during the subscription period causes billing to revert to hourly rates for the new type, creating another avenue for unintended cost increases even before the subscription formally expires.
Azure Function apps can persist long after the applications or workflows they supported have been retired — particularly in development, testing, and experimentation environments where cleanup is often overlooked. Even when no functions are deployed or no triggers are active, the underlying infrastructure dependencies continue to generate charges. The nature and severity of this waste depends heavily on the hosting plan type: function apps on Premium or Dedicated (App Service) plans incur continuous compute charges for allocated instances regardless of activity, while even Consumption plan function apps still require an associated storage account that accrues transaction and capacity costs from internal runtime operations.
Each function app is provisioned with a required Azure Storage account used for storing function code, managing triggers, and maintaining execution state. This storage account generates costs through read/write transactions and capacity usage even when the function app is completely idle — driven by the Functions runtime's internal health checks and state management. Additionally, if Application Insights was enabled for monitoring, telemetry data ingestion charges can accumulate silently in the background. Across an organization with dozens of abandoned function apps spanning multiple subscriptions, these individually modest charges compound into meaningful and entirely avoidable waste.
Azure Virtual Machine Scale Sets can operate in two modes: manual scaling with a fixed instance count, or autoscaling with dynamic instance counts that respond to demand. When a scale set is configured with manual scaling, it maintains the same number of VM instances at all times — regardless of whether those instances are actively processing workload. Every provisioned instance continues to incur per-second compute charges, meaning the organization pays for full capacity even during off-peak hours, weekends, or seasonal lulls when only a fraction of that capacity is needed.
This pattern is especially wasteful for workloads with variable demand — web applications with daily traffic cycles, batch processing jobs that run at specific intervals, or services with clear seasonal peaks. If a scale set is sized for peak demand but runs at that capacity around the clock, the gap between provisioned resources and actual utilization translates directly into unnecessary spend. Microsoft explicitly identifies autoscaling as a mechanism to reduce scale set costs by running only the number of instances required to meet current demand.
There are legitimate reasons to maintain fixed capacity — stateful applications that cannot tolerate dynamic instance changes, workloads with licensing constraints tied to specific instance counts, or scenarios where consistent performance without scale-up latency is critical. However, many scale sets running at fixed capacity do so simply because autoscaling was never configured, not because it was deliberately excluded. Identifying and addressing these cases represents a significant cost optimization opportunity.
Azure Firewall is available in three SKUs — Basic, Standard, and Premium — each designed for different security requirements and priced accordingly. The Premium SKU includes advanced threat protection capabilities such as TLS inspection, signature-based intrusion detection and prevention (IDPS), URL filtering, and web categories. These features are designed for highly sensitive and regulated environments, such as those processing payment card data or requiring PCI DSS compliance. However, many organizations deploy the Premium SKU by default — often during initial provisioning or as a precautionary measure — without actively configuring or requiring any of these Premium-exclusive features.
The cost impact is significant because the Premium SKU carries a substantially higher fixed hourly deployment charge compared to the Standard SKU — approximately 40% more — while the per-gigabyte data processing rate remains the same across both tiers. Since this hourly charge accrues continuously regardless of whether Premium features are enabled or traffic is flowing, every firewall instance running on the Premium SKU without leveraging its advanced capabilities represents a persistent and avoidable cost premium. In organizations with multiple firewall deployments across subscriptions and environments, this waste compounds quickly.
This pattern is especially common in non-production environments such as development and staging, where advanced threat protection features like TLS inspection and IDPS provide little practical value. Microsoft has recognized this as a frequent optimization opportunity and introduced a zero-downtime SKU change feature specifically to simplify the downgrade process from Premium to Standard.
Azure Bastion incurs continuous hourly charges from the moment it is deployed until the resource is deleted — regardless of whether any connections are actively being made. This means a Bastion host sitting idle in a development or test environment generates the same cost as one actively serving remote sessions. Because there is no ability to pause or stop a Bastion deployment, the only way to eliminate charges is to delete the resource entirely.
This inefficiency is especially common in non-production environments where Bastion may have been provisioned for occasional troubleshooting or administrative access but then left running indefinitely. Teams often deploy Bastion during initial environment setup and forget about it, or assume it only costs money when sessions are active. Over time, these idle deployments quietly accumulate significant charges — particularly when deployed at the Basic, Standard, or Premium SKU tiers, which use dedicated infrastructure and carry meaningful hourly rates.
The cost impact compounds across an organization with multiple subscriptions or environments. A single idle Bastion host may seem modest in isolation, but dozens of forgotten deployments across dev, test, staging, and sandbox environments can represent a substantial and entirely avoidable expense.
Azure NetApp Files bills based on provisioned capacity pool size — not on the actual data stored within volumes. This means that when a capacity pool is provisioned at a size significantly larger than the sum of volume quotas allocated within it, the organization pays for stranded, unallocated capacity every hour. For example, a 10 TiB capacity pool with only 6 TiB of volume quotas allocated has 4 TiB of capacity that generates cost but serves no purpose.
This overprovisioning commonly occurs for several reasons. Capacity pools do not automatically shrink — since April 2021, pool sizing is entirely a manual customer responsibility. When volumes are deleted, the freed capacity remains in the pool unless an administrator explicitly resizes it downward. Additionally, with auto QoS pools, volume quotas directly determine throughput performance, which incentivizes teams to set larger quotas than their data requires, further inflating pool sizes. Over time, these dynamics create a growing gap between provisioned pool capacity and what is actually needed, resulting in persistent, avoidable charges that compound across multiple pools and regions.
Azure Cache for Redis is billed at a fixed rate determined entirely by the provisioned tier and cache size — not by actual utilization. A cache instance that consumes only a fraction of its available memory and throughput incurs the same cost as one running at full capacity. This means that when a cache is sized larger than the workload demands, the unused memory and throughput headroom represent pure waste with no corresponding benefit.
Overprovisioning commonly occurs when teams size caches for anticipated peak loads that never materialize, or when workload patterns shift over time — such as after a migration, application refactor, or traffic decline — without a corresponding review of cache sizing. Because there is no option to stop or pause billing on a cache instance, and charges accrue continuously from the moment the cache is created until it is deleted, oversized caches quietly accumulate unnecessary costs around the clock.
An important constraint compounds this issue: scaling down between tiers is not supported. An organization that initially provisions a Premium-tier cache but later determines that a Standard tier would suffice cannot simply downgrade in place — it must create a new cache at the appropriate tier and migrate data. This friction often delays right-sizing efforts and prolongs overspend.
Azure Logic Apps can quietly accumulate costs even when no workflows are actively executing, but the mechanism differs significantly depending on the deployment model. In the Consumption (multitenant) plan, Logic Apps with polling triggers continue to generate billable trigger executions every time the trigger checks for events — even when no events are found and no workflow runs are initiated. A polling trigger configured to check every 30 seconds produces thousands of billable executions per day, all charged at the per-execution rate, regardless of whether any useful work is performed. Webhook or push-based triggers avoid this particular waste, but retained run history and storage operations can still accrue minor costs over time.
In the Standard (single-tenant) plan, the cost driver is fundamentally different. Customers pay for reserved compute capacity — vCPU and memory — on an hourly basis, whether or not any workflows execute. An idle Standard Logic App incurs the full hosting plan charges around the clock. Disabling a Standard Logic App prevents triggers from firing but does not stop the hosting plan billing; only deletion or consolidation of the underlying plan reduces costs.
These idle Logic Apps commonly arise after application decommissioning, migration projects, or proof-of-concept work that was never cleaned up. At enterprise scale, where dozens or hundreds of Logic Apps may exist across multiple environments, the cumulative waste from untriggered workflows and unused hosting plans can become substantial — particularly when the resources are spread across teams and subscriptions with no centralized review process.
In November 2025, AWS introduced an Archive storage class for private ECR repositories, marketed as a way to reduce storage costs for large volumes of rarely used container images. However, Archive storage pricing is identical to Standard storage pricing for the first 150 TB per month. Below this threshold, Archive provides no storage savings yet introduces a per-gigabyte retrieval charge, a retrieval delay of up to 20 minutes, and a 90-day minimum storage duration. Adopting the Archive storage class before meeting the 150 TB threshold means paying the same storage price but taking on additional fees and operational overhead.
This inefficiency is easy to miss because the AWS announcement emphasized cost savings for "large volumes" without quantifying "large" or prominently disclosing the retrieval charge or the minimum storage duration. In other AWS services, optional storage classes typically offer a storage price reduction from the first byte, in exchange for access penalties. With ECR, however, access penalties apply as described, but the storage price is unchanged for the first 150 TB, a container storage volume that few organizations achieve.
Organizations often use the Standard - Infrequent Access (Standard-IA) storage class based on documentation and code that predate 2021 updates to the Intelligent Tiering storage class. Intelligent Tiering became suitable as an initial S3 storage class even for objects that are small and/or will be deleted early. It also gained a heavily-discounted access tier. Older internal runbooks, lifecycle policies (including ones specified in infrastructure-as-code templates), scripts, programs, and public examples may still default to Standard-IA, inflating storage costs.
This inefficiency report compares Standard-IA with Intelligent Tiering. It is not intended to cover other storage classes. S3 storage is billed per gibibyte or GiB (powers of 2) rather than per gigabyte or GB (powers of 10), which matters for small objects and also for large volumes of storage.
Relative to the Standard storage class, the Standard-IA storage class offers a moderate, constant storage price discount but imposes a minimum billable object size of 128 KiB, a minimum storage duration of 30 days, and a per-GiB retrieval charge.
In contrast, AWS updated the Intelligent Tiering storage class in September, 2021, eliminating the minimum storage duration and exempting small objects from a monthly per-object monitoring and automation charge. Intelligent Tiering never had retrieval charges. In November, 2021, AWS added the heavily-discounted Archive Instant Access tier.
For objects stored beyond a few months, Intelligent Tiering's progressive storage price discounts surpass Standard-IA's constant discount. Storage savings accumulate each month. Objects in the Intelligent Tiering storage class automatically move through progressively cheaper access tiers unless the objects are accessed. Intelligent Tiering also avoids Standard-IA's minimum billable object size and minimum storage duration penalties.
Azure App Service Plans define the compute resources allocated to web applications and are billed continuously based on their pricing tier — regardless of whether the hosted apps are actively serving traffic. In non-production environments such as development, testing, or staging, workloads typically follow predictable usage patterns aligned with business hours. When these plans remain provisioned at higher-cost tiers around the clock, organizations pay premium rates for compute capacity that sits idle during evenings, weekends, and holidays.
A common misconception is that stopping the apps within a plan will halt charges. In reality, the App Service Plan itself is the billing container, and charges accrue as long as the plan exists at a dedicated tier — even with all apps stopped or deleted. Simply stopping apps provides no cost relief. Instead, the plan's tier must be actively changed to a lower-cost option during periods of inactivity to realize savings. This temporal tier-switching pattern is distinct from scaling out (adjusting instance count) or right-sizing (choosing a permanently smaller tier), and is particularly effective for non-production workloads where brief interruptions during tier transitions are acceptable.
Because higher tiers such as Premium or Standard carry significantly higher per-hour rates than Basic tier, leaving these plans unchanged during extended idle periods represents a significant and avoidable expense. Organizations with multiple non-production App Service Plans can accumulate substantial waste if this pattern is not addressed.
Amazon SQS does not charge for queue existence, message storage, or the number of queues — cost is driven entirely by API requests and data transfer. When consumers continue polling a queue that no longer receives messages, every ReceiveMessage call that returns empty is billed at the same rate as a call that returns data. These "empty receives" are the most common source of unexpected SQS charges and represent pure waste when the queue serves no active purpose.
This pattern is especially prevalent in serverless architectures where Lambda functions are configured as SQS event sources. In this setup, AWS automatically manages a fleet of pollers that continuously make ReceiveMessage calls to the queue — starting with multiple concurrent pollers and scaling based on message volume. Even on a completely idle queue, this automated polling generates a steady stream of empty receives around the clock. Because the polling is managed by the platform rather than application code, teams often overlook it entirely.
While the cost per individual idle queue may appear modest, the waste compounds quickly across organizations with many queues spanning development, staging, and production environments. The SQS free tier can mask the issue in small deployments, but organizations with dozens or hundreds of forgotten queues — each with active consumers or Lambda triggers — can accumulate meaningful unnecessary spend.
When organizations purchase AWS Savings Plans during periods of elevated AI inference demand — such as experimentation phases, feature launches, or early adoption surges — the committed hourly spend may significantly exceed what is needed once workloads stabilize. GPU-backed inference clusters running on high-cost instance families can drive substantial compute consumption during these peaks, and if that peak usage is used as the baseline for commitment sizing, the resulting Savings Plan will be oversized relative to steady-state demand. Because Savings Plans are billed as a fixed hourly dollar commitment for the entire term, any unused portion in a given hour is forfeited — it cannot be carried over, recouped, or applied to future hours.
This pattern is especially costly for AI inference workloads because GPU-accelerated instances carry significantly higher hourly rates than general-purpose compute, amplifying the financial impact of each underutilized hour. The problem compounds when inference workloads shift between instance families, regions, or deployment architectures over time — a common occurrence as teams optimize models, adopt newer hardware generations, or consolidate serving infrastructure. EC2 Instance Savings Plans, which are scoped to a specific instance family and region, are particularly vulnerable to these shifts. Critically, Savings Plans cannot be canceled, modified, or sold on any marketplace once purchased, making the commitment irrevocable for the full term with only a narrow return window available under limited conditions.
The net result is a sustained gap between committed spend and actual covered usage, eroding the discount benefit that justified the commitment in the first place. In cases of sustained underutilization, the effective discount achieved by the Savings Plan can be materially reduced, undermining the expected financial benefit of the commitment.
When external Delta tables are dropped from Databricks Unity Catalog or the legacy Hive metastore, only the table metadata is removed — the underlying data files in cloud object storage (such as S3, ADLS, or GCS) remain untouched and continue to incur per-GB-month storage charges. This behavior is by design: external tables decouple metadata from data lifecycle management, meaning Databricks explicitly does not delete the underlying storage when an external table is dropped. The result is orphaned storage — files that no longer have any catalog reference, are not consumed by any downstream pipeline, and deliver no business value, yet continue to accumulate charges indefinitely.
This pattern is particularly prevalent in environments using medallion architecture (bronze/silver/gold layers), where tables are frequently recreated during pipeline evolution, schema experimentation, or migration between environments. Development and test workloads compound the problem, as teams routinely create and abandon external table references without cleaning up the associated storage. Unlike managed tables in Unity Catalog — which have a retention period with recovery capability before automatic deletion — external tables offer no such safety net. The orphaned storage is structurally invisible to standard cost dashboards because it appears as generic object storage charges, not as Databricks-specific line items. Over time, this silent accumulation can represent a meaningful share of an organization's total storage spend.
Importantly, Databricks VACUUM operations do not address this pattern. VACUUM cleans up old file versions within active Delta tables, but it cannot act on storage paths that have been completely disconnected from catalog metadata through external table drops. The only way to reclaim this storage is to manually identify and delete the orphaned files in cloud storage.
Custom metrics published to CloudWatch can be configured at two resolutions: standard (60-second intervals) or high resolution (1-second intervals). While both resolutions are priced identically for metric storage, the critical cost difference lies in the volume of API calls required to publish the data. A metric published every second generates 60 times more API calls than one published every 60 seconds. At scale — across hundreds or thousands of custom metrics in a microservices architecture — this multiplier translates into substantial and avoidable API charges that accumulate month over month.
This inefficiency commonly arises when teams default to high-resolution publishing without evaluating whether sub-minute granularity is actually needed for their monitoring use cases. Many workloads — including capacity planning, cost analysis, and non-critical service monitoring — function perfectly well with standard or even lower resolution. Compounding the issue, high-resolution metric data is only retained at its full 1-second granularity for three hours before being automatically aggregated to coarser intervals. Teams may therefore be paying a premium in API costs for resolution they cannot even query historically. Additionally, if alarms are configured to evaluate high-resolution metrics at sub-minute intervals, those alarms carry a higher per-alarm charge compared to standard-resolution alarms.
This inefficiency occurs when Azure Load Balancers remain provisioned after the backend workloads they supported have been scaled down, stopped, or decommissioned. This is common in non-production environments where virtual machines are shut down outside business hours, but the associated load balancers are left in place. Even when no meaningful traffic is flowing, the load balancer continues to incur base charges, resulting in ongoing cost without delivering value.
This inefficiency occurs when BigQuery slot reservations are sized for peak or anticipated demand but are not adjusted as workloads evolve. When actual query concurrency or complexity is lower than expected, a portion of the reserved slots remains idle. Because slot reservations are billed independently of usage, underutilized capacity results in sustained waste even while on-demand query costs elsewhere may continue.
This commonly happens when reservations are created during migrations, one-time analytical initiatives, or early scaling phases and are not revisited once usage stabilizes.
This inefficiency occurs when an RDS database instance is deleted but its manual snapshots or retained backups remain. Unlike automated backups tied to a live instance, these backups persist independently and continue generating storage costs despite no longer supporting any active database. This is distinct from excessive retention on active databases and typically arises from incomplete cleanup during decommissioning.
This inefficiency occurs when analysts use SELECT * (reading more columns than needed) and/or rely on LIMIT as a cost-control mechanism. In BigQuery, projecting excess columns increases the amount of data read and can materially raise query cost, particularly on wide tables and frequently-run queries. Separately, applying LIMIT to a query does not inherently reduce bytes processed for non-clustered tables; it mainly caps the result set returned. The “LIMIT saves cost” assumption is only sometimes true on clustered tables, where BigQuery may be able to stop scanning earlier once enough clustered blocks have been read.
This inefficiency occurs when an App Service Plan is sized larger than required for the applications it hosts. Plans are often provisioned conservatively to handle anticipated peak demand and are not revisited after workloads stabilize. Because pricing is tied to the plan’s SKU rather than real-time usage, oversized plans continue to incur higher costs even when CPU and memory utilization remain consistently low.
This inefficiency occurs when an Azure Virtual WAN hub is provisioned with more capacity than required to support real network traffic. Because hub costs scale with the number of configured scale units, overprovisioned hubs continue to incur higher charges even when traffic levels remain consistently low. This commonly happens when hubs are sized for peak or anticipated demand that never materializes, or when traffic patterns change over time without corresponding capacity adjustments.
This inefficiency occurs when a function has steady, high-volume traffic (or predictable load) but continues running on default Lambda pricing, where costs scale with execution duration. Lambda Managed Instances runs Lambda on EC2 capacity managed by Lambda and supports multi-concurrent invocations within the same execution environment, which can materially improve utilization for suitable workloads (often IO-heavy services). For these steady-state patterns, shifting from duration-based billing to instance-based billing (and potentially leveraging EC2 pricing options like Savings Plans or Reserved Instances) can reduce total cost—while keeping the Lambda programming model. Savings are workload-dependent and not guaranteed.
This inefficiency occurs when Azure SQL Managed Instances continue running on legacy General Purpose or Business Critical tiers despite the availability of the next-gen General Purpose tier. The newer tier enables more granular scaling of vCPU, memory, and storage, allowing workloads to better match actual resource needs. In many cases, workloads running on Business Critical—or overprovisioned legacy General Purpose—do not require the premium performance or architecture of those tiers and could achieve equivalent outcomes at lower cost by moving to next-gen General Purpose.
This inefficiency occurs when backup data remains in a Recovery Services Vault after the original protected resource has been deleted. These orphaned backups continue to consume storage and generate cost despite no longer supporting an active workload. In addition, long-retained backups that are rarely accessed are often kept in higher-cost tiers, increasing storage spend without providing additional value.
This inefficiency occurs when Savings Plans are purchased within the final days of a calendar month, reducing or eliminating the ability to reverse the purchase if errors are discovered. Because the refund window is constrained to both a 7-day period and the same month, late-month purchases materially limit correction options. This increases the risk of locking in misaligned commitments (e.g., incorrect scope, amount, or term), which can lead to sustained underutilization and unnecessary long-term spend.
This inefficiency occurs when licensed Azure DevOps users remain assigned after individuals leave the organization or stop using the platform. These inactive users continue to generate recurring per-user charges despite providing no ongoing value, leading to unnecessary spend over time.
This inefficiency occurs when teams assume AWS Marketplace SaaS purchases will contribute toward EDP or PPA commitments, but the SaaS product is not eligible under AWS’s “Deployed on AWS” standard. As of May 1, 2025, AWS Marketplace allows SaaS products regardless of where they are hosted, while separately identifying products that qualify for commitment drawdown via a visible “Deployed on AWS” badge.
Eligibility is determined based on the invoice date, not the contract signing date. As a result, Marketplace SaaS contracts signed prior to the policy change may still generate invoices after May 1, 2025 that no longer qualify for commitment retirement. This can lead to Marketplace spend appearing on AWS invoices without reducing commitments, creating false confidence in commitment progress and increasing the risk of end-of-term shortfalls.
This inefficiency occurs when workloads are constrained to run only on Spot-based capacity with no viable path to standard nodes when Spot capacity is reclaimed or unavailable. While Spot reduces unit cost, rigid dependence can create hidden costs by requiring standby standard capacity elsewhere, delaying deployments, or increasing operational intervention to keep environments usable. GKE explicitly recommends mixing Spot and standard node pools for continuity when Spot is unavailable.
This inefficiency occurs when Kubernetes Jobs or CronJobs running on EKS Fargate leave completed or failed pod objects in the cluster indefinitely. Although the workload execution has finished, AWS keeps the underlying Fargate microVM running to allow log inspection and final status checks. As a result, vCPU, memory, and networking resources remain allocated and billable until the pod object is explicitly deleted.
Over time, large numbers of stale Job pods can generate direct compute charges as well as consume ENIs and IP addresses, leading to both unnecessary spend and capacity pressure. This pattern is common in batch-processing and scheduled workloads that lack automated cleanup.
This inefficiency occurs when ElastiCache clusters continue running engine versions that have moved into extended support. While the service remains functional, AWS charges an ongoing premium for extended support that provides no added performance or capability. These costs are typically avoidable by upgrading to a version within standard support.
This inefficiency occurs when workloads with predictable, long-running compute usage continue to run entirely on on-demand pricing instead of leveraging Committed Use Discounts. For stable environments, such as production services or continuously running batch workloads, failing to apply CUDs results in materially higher compute spend without any operational benefit. The inefficiency is driven by pricing choice, not resource overuse.
This inefficiency occurs when backup data persists longer than intended due to misaligned or outdated retention policies. It often arises when retention requirements change over time, but older recovery points are not evaluated or cleaned up accordingly. In some cases, manually configured backups or legacy policies remain in place even after operational or compliance needs have been reduced.
As a result, backup storage continues to grow and incur cost without delivering additional recovery value.
This inefficiency occurs when Amazon Aurora database clusters are intentionally stopped to avoid compute costs but are automatically restarted by the service after the maximum allowed stop period. Once restarted, re-started database instances begin accruing instance-hour charges even if the database is not needed.
Because Aurora does not provide native lifecycle controls to keep clusters stopped indefinitely, this behavior can result in recurring, unintended compute spend—particularly in non-production, seasonal, or infrequently accessed environments where clusters are stopped and forgotten.
This inefficiency occurs when automated Cloud SQL backups are retained longer than required by recovery objectives or governance needs. Because backups accumulate over the retention window (and can grow quickly for high-change databases), excessive retention drives ongoing backup storage charges without improving practical recoverability.
This inefficiency occurs when production and non-production applications are hosted within the same App Service Plan. Production workloads often require higher availability, performance, or scaling characteristics, driving the plan toward larger or higher-cost SKUs. When non-production workloads share that plan, they inherit the higher cost structure even though their availability and performance requirements are typically much lower, resulting in unnecessary spend.
This inefficiency occurs when pod resource requests—often inflated by sidecar containers—push total memory or CPU just over a Fargate sizing boundary. Because Fargate adds mandatory system overhead and only supports fixed resource combinations, small incremental increases can force a pod into a much larger billing tier. This results in materially higher cost for marginal additional resource needs, especially in workloads that run continuously or at scale.
This inefficiency occurs when Provisioned Concurrency is enabled for Lambda functions that do not require consistently low latency or steady traffic. In such cases, reserved capacity remains allocated and billed during idle periods, creating ongoing cost without proportional performance or business benefit. This is distinct from standard Lambda execution charges, which are purely usage-based.
This inefficiency occurs when a protected resource (such as a virtual machine, database, or file share) is decommissioned without explicitly stopping backup protection. In these cases, Azure Backup continues to retain existing recovery points in the vault until the retention policy expires. Although the source resource no longer exists, backup storage remains allocated and billable, resulting in unnecessary ongoing costs.
This pattern is common when infrastructure is deleted outside of a formal decommissioning process or when backup ownership is unclear.
This inefficiency occurs when an Azure Savings Plan is scoped too narrowly relative to where eligible compute usage actually runs. When usage is spread across multiple subscriptions or fluctuates significantly (for example, development and test workloads that are frequently stopped and started), a narrowly scoped Savings Plan may not consistently find enough eligible usage to consume the full commitment. As a result, part of the committed hourly spend goes unused while other eligible workloads outside the scope continue to incur on-demand charges.
Azure supports broader scoping options—such as Management Group or Shared scope—that allow the commitment to be applied across a larger pool of eligible compute. Selecting an overly restrictive scope can therefore directly drive underutilization, even when sufficient total usage exists across the tenant.
Teams often start custom-model deployments with large architectures, full-precision weights, or older model versions carried over from training environments. When these models transition to Bedrock’s managed inference environment, the compute footprint (especially GPU class) becomes a major cost driver. Common inefficiencies include: * Deploying outdated custom models despite newer, more efficient variants being available, * Running full-size models for tasks that could be served by distilled or quantized versions, * Using accelerators overpowered for the workload’s latency requirements, or * Relying on default model artifacts instead of optimizing for inference. Because Bedrock Custom Models bill continuously for the backing compute, even small inefficiencies in model design or versioning translate into substantial ongoing cost.
Generative workloads that produce long outputs—such as detailed summaries, document rewrites, or multi-paragraph chat completions—require extended model runtime.
Embedding-based retrieval enables semantic matching even when keywords differ. But many Databricks workloads—catalog lookups, metadata search, deterministic classification, or fixed-rule routing—do not require semantic understanding. When embeddings are used anyway, teams incur DBU cost for embedding generation, additional storage for vector columns or indexes, and more expensive similarity-search compute. This often stems from defaulting to a RAG approach rather than evaluating whether a simpler retrieval mechanism would perform equally well.
Embeddings enable semantic retrieval by capturing the meaning of text, while keyword search returns results based on exact or lexical matches. Many Azure workloads—FAQ search, routing, deterministic classification, or structured lookups—achieve the same or better accuracy using simple keyword or metadata filtering. When embeddings are used for these uncomplicated tasks, organizations pay for token-based embedding generation, vector storage, and compute-heavy similarity search without receiving meaningful quality improvements. This inefficiency often occurs when RAG is used automatically rather than intentionally.
Embeddings enable semantic similarity search by representing text as high-dimensional vectors. Keyword search, however, returns results based on lexical matches and is often sufficient for simple retrieval tasks such as FAQ matching, deterministic filtering, metadata lookup, or rule-based routing. When embeddings are used for these low-complexity scenarios, organizations pay for compute to generate embeddings, storage for vector columns, and compute-heavy cosine similarity searches — without improving accuracy or user experience. In Snowflake, this can also increase warehouse load and query runtime.
Embeddings enable semantic search by converting text into vectors that capture meaning. Keyword or metadata search performs exact or simple lexical matches. Many workloads—FAQ lookup, helpdesk routing, short product lookups, or rule-based filtering—do not benefit from semantic search. When embeddings are used anyway, organizations pay for embedding generation, vector storage, and similarity search without gaining accuracy or relevance improvements. This often happens when teams adopt RAG “by default” for problems that do not require semantic understanding.
Embeddings allow semantic search — they map text into vectors so the system can find content with similar meaning, even if the keywords don’t match. Keyword or metadata search, by contrast, looks for exact terms or simple filters. Many workloads (FAQ lookups, short product searches, rule-based routing) do not need semantic understanding and perform just as well with basic keyword logic. When teams use embeddings for these simple tasks, they pay for embedding generation, vector storage, and similarity search without gaining meaningful accuracy or functionality.
Verbose logging is useful during development, but many teams forget to disable it before deploying to production. Generative AI workloads often include long prompts, large multi-paragraph outputs, embedding vectors, and structured metadata. When these full payloads are logged on high-throughput production endpoints, Cloud Logging costs can quickly exceed the cost of the model inference itself. This inefficiency commonly arises when development-phase logging settings carry into production environments without review.
Vertex AI Prediction Endpoints support autoscaling but require customers to specify a **minimum number of replicas**. These replicas stay online at all times to serve incoming traffic. When the minimum value is set too high for real traffic levels, the system maintains idle capacity that still incurs hourly charges. This inefficiency commonly arises when teams: * Use default replica settings during initial deployment, * Intentionally overprovision “just in case” without revisiting the configuration, or * Copy settings from production into lower-traffic dev or QA environments. Over time, unused replica hours accumulate into significant, silent spend.
A large portion of real-world AI workloads involve repetitive or deterministic inference patterns—such as classification labels, routing logic, metadata extraction, FAQ responses, keyword detection, or summarization of static content. Vertex AI does **not** provide native inference caching, so applications that repeatedly send identical prompts to the model incur avoidable cost. When no caching mechanism is implemented, workloads repeatedly invoke the model and consume tokens even though the output is predictable. Over time, especially at scale, these repetitive token charges accumulate into significant waste. This inefficiency is common in early-stage deployments where teams optimize for correctness rather than cost.
Vertex AI model families evolve rapidly. New model versions (e.g., transitions within the Gemini family) frequently introduce improvements in efficiency, quality, and capability. When workloads continue using older, legacy, or deprecated models, they may consume more tokens, produce lower-quality results, or experience higher latency than necessary. Because generative workloads often scale quickly, even small efficiency gaps between generations can materially increase token consumption and cost. Teams that do not actively track model updates, or that set model types once and never revisit them, often miss opportunities to improve performance-per-dollar by upgrading to the most current supported model.
Bedrock’s model catalog evolves quickly as providers release new versions—such as successive Claude model families or updated Amazon Titan models. These newer models frequently offer improved performance, more efficient reasoning, better context handling, and higher-quality outputs compared to older generations. When workloads continue using older or deprecated models, they may require **more tokens**, experience **slower inference**, or miss out on accuracy improvements available in successor models. Because Bedrock bills per token or per inference unit, these inefficiencies can increase cost without adding value. Ensuring workloads align with the most suitable current-generation model improves both performance and cost-effectiveness.
Vertex AI workloads often include low-complexity tasks such as classification, routing, keyword extraction, metadata parsing, document triage, or summarization of short and simple text. These operations do **not** require the advanced multimodal reasoning or long-context capabilities of larger Gemini model tiers. When organizations default to a single high-end model (such as Gemini Ultra or Pro) across all applications, they incur elevated token costs for work that could be served efficiently by **Gemini Flash** or smaller task-optimized variants. This mismatch is a common pattern in early deployments where model selection is driven by convenience rather than workload-specific requirements. Over time, this creates unnecessary spend without delivering measurable value.
Many Bedrock workloads involve low-complexity tasks such as tagging, classification, routing, entity extraction, keyword detection, document triage, or lightweight summarization. These tasks **do not require** the advanced reasoning or generative capabilities of higher-cost models such as Claude 3 Opus or comparable premium models. When organizations default to a high-end model across all applications—or fail to periodically reassess model selection—they pay elevated costs for work that could be performed effectively by smaller, lower-cost models such as Claude Haiku or other compact model families. This inefficiency becomes more pronounced in high-volume, repetitive workloads where token counts scale quickly.
Bedrock workloads commonly include repetitive inference patterns—such as classification results, prompt templates generating deterministic outputs, FAQ responses, document tagging, and other predictable or low-variability tasks. Without a caching strategy (API-layer cache, application cache, or hash-based prompt cache), these workloads repeatedly invoke the model and incur token costs for answers that do not change. Because Bedrock does not offer native inference caching, customers must implement caching externally. When no cache layer exists, cost increases linearly with repeated calls, even though responses remain constant. This issue appears most often when teams treat all workloads as dynamic or generative, rather than separating deterministic tasks from open-ended ones.
A large share of production AI workloads include repetitive or static requests—such as classification labels, routing decisions, FAQ responses, metadata extraction, or deterministic prompt templates. Without a caching layer, every repeated request is sent to the model, incurring full token charges and increasing latency. Azure OpenAI does not provide native caching, so teams must implement caching at the application or API gateway layer. When caching is absent, workloads repeatedly spend tokens for identical outputs, creating avoidable cost. This inefficiency often arises when teams optimize only for correctness—not cost—and default to calling the model for every invocation regardless of whether the response is predictable.
Many Azure OpenAI workloads—such as reporting pipelines, marketing workflows, batch inference jobs, or time-bound customer interactions—only run during specific periods. When PTUs remain fully provisioned 24/7, organizations incur continuous fixed cost even during extended idle time. Although Azure does not offer native PTU scheduling, teams can use automation to provision and deprovision PTUs based on predictable cycles. This allows them to retain performance during peak windows while reducing cost during low-activity periods.
Development, testing, QA, and sandbox environments rarely have the steady, predictable traffic patterns needed to justify PTU deployments. These workloads often run intermittently, with lower throughput and shorter usage windows. When PTUs are assigned to such environments, the fixed hourly billing generates continuous cost with little utilization. Switching non-production workloads to PAYG aligns cost with actual usage and eliminates the overhead of managing PTU quota in low-stakes environments.
When organizations size PTU capacity based on peak expectations or early traffic projections, they often end up with more throughput than regularly required. If real-world usage plateaus below provisioned levels, a portion of the PTU capacity remains idle but still generates full spend each hour. This is especially common shortly after production launch or during adoption of newer GPT-4 class models, where early conservative sizing leads to long-term over-allocation. Rightsizing PTUs based on observed usage patterns ensures that capacity matches actual demand.
AWS frequently updates Bedrock with improved foundation models, offering higher quality and better cost efficiency. When workloads remain tied to older model versions, token consumption may increase, latency may be higher, and output quality may be lower. Using outdated models leads to avoidable operational costs, particularly for applications with consistent or high-volume inference activity. Regular modernization ensures applications take advantage of new model optimizations and pricing improvements.
Many production Azure OpenAI workloads—such as chatbots, inference services, and retrieval-augmented generation (RAG) pipelines—use PTUs consistently throughout the day. When usage stabilizes after initial experimentation, continuing to rely on on-demand PTUs results in ongoing unnecessary spend. These workloads are strong candidates for reserved PTUs, which provide identical performance guarantees at a substantially reduced hourly rate. Migrating to reservations usually requires no architectural changes and delivers immediate cost savings.
Azure releases newer OpenAI models that provide better performance and cost characteristics compared to older generations. When workloads remain on outdated model versions, they may consume more tokens to produce equivalent output, run slower, or miss out on quality improvements. Because customers pay per token, using an older model can lead to unnecessary spending and reduced value. Aligning deployments to the most current, efficient model types helps reduce spend and improve application performance.
Some workloads — such as text classification, keyword extraction, intent detection, routing, or lightweight summarization — do not require the capabilities of the most advanced model families. When high-cost models are used for these simple tasks, organizations pay elevated token rates for work that could be handled effectively by more efficient, lower-cost models. This mismatch typically arises from defaulting to a single model for all tasks or not periodically reviewing model usage patterns across applications.
PTU deployments guarantee dedicated throughput and low latency, but they also require paying for reserved capacity at all times. In non-production environments—such as dev, test, QA, or experimentation—usage patterns are typically sporadic and unpredictable. Deploying PTUs in these environments leads to consistent baseline spend without corresponding value. On-demand deployments scale usage cost with actual consumption, making them more cost-efficient for variable workloads.
Serverless is attractive for variable or idle workloads, but it can become more expensive than Provisioned compute when database activity is high for long portions of the day. As active time increases, per-second compute accumulation approaches—or exceeds—the fixed monthly cost of a Provisioned tier. This inefficiency arises when teams adopt Serverless as a default without assessing workload patterns. Databases with steady demand, predictable traffic, or long active periods often operate more cost-effectively on Provisioned compute. The economic break-even point depends on workload activity, and when that threshold is consistently exceeded, Provisioned becomes the more efficient option.
Databases deployed on Provisioned compute incur continuous hourly charges even when workload demand is low. For databases that are active only briefly within an hour, or for limited hours per month, Serverless can provide significantly lower cost because it bills only for active compute time. The economic break-even point between Provisioned and Serverless depends on workload activity patterns. If monthly active time falls *below* the conceptual break-even range, Serverless is more cost-effective. If active time regularly exceeds that range, Provisioned may be more appropriate. This inefficiency typically appears when teams default to Provisioned compute without evaluating workload behavior over time.
When Integration Runtimes are configured with the default “Auto Resolve” region setting, Azure may automatically provision them in a region different from the data sources or sinks. For example, an environment deployed in West Europe may run pipelines in US East. This causes unnecessary cross-region data transfer, increasing networking costs and pipeline latency. The inefficiency often goes unnoticed because data transfer costs are billed separately from pipeline compute charges.
Newer AWS Glue versions—such as Glue 5.0—include significant performance optimizations for **Python-based** ETL jobs, often reducing runtime by 10–60%. These improvements do not require any code changes, making version upgrades a simple and impactful optimization. When jobs remain on older runtimes such as Glue 3.0 or 4.0, they execute more slowly, consume more DPUs, and incur unnecessary cost. Additionally, Glue 5.0 offers more worker types (larger standard workers and memory-optimized workers), that can provide additional performance gain for some jobs. This inefficiency does not apply to Scala-based jobs, which do not benefit from the same performance uplift.
Many organizations retain all logs in Cloud Logging’s standard storage, even when the data is rarely queried or required only for audit or compliance. Logging buckets are priced for active access and are not optimized for low-frequency retrievas, results in unnecessary expense. Redirecting logs to BigQuery or Cloud Storage can provide better cost efficiency, particularly when coupled with lifecycle policies or table partitioning. Choosing the optimal storage destination based on access frequency and analytics needs is essential to control log retention costs.
Some GCP services and workloads generate INFO-level logs at very high frequencies — for example, load balancers logging every HTTP request or GKE nodes logging system health messages. While valuable for debugging, these logs can flood Cloud Logging with non-critical data. Without log-level tuning or exclusion filters, organizations incur continuous ingestion charges for messages that are seldom analyzed. Over time, this behavior compounds into a persistent waste driver across large-scale environments.
Non-production environments frequently generate INFO-level logs that capture expected system behavior or routine API calls. While useful for troubleshooting in development, they rarely need to be retained. Allowing all INFO logs to be ingested and stored in Logging buckets across dev or staging environments can lead to disproportionate ingestion and storage costs. This inefficiency often persists because log routing and severity filters are not differentiated between production and non-production projects.
Duplicate log storage occurs when multiple sinks capture the same log data — for example, organization-wide sinks exporting all logs to Cloud Storage and project-level sinks doing the same. This redundancy results in paying twice (or more) for identical data. It often arises from decentralized logging configurations, inherited policies, or unclear ownership between teams. The problem is compounded when logs are routed both to Cloud Logging and external observability platforms, creating parallel ingestion streams and double billing.
Azure Hybrid Benefit allows organizations to apply existing SQL Server licenses with Software Assurance or qualifying subscriptions to Azure SQL Databases. When this configuration is missed or not enforced, workloads continue to incur license-inclusive costs despite license ownership. This oversight often occurs in environments where licensing governance is decentralized or when databases are provisioned manually without applying existing entitlements. Across multiple databases or elastic pools, these duplicated license costs can accumulate substantially over time.
Many organizations purchase Software Assurance or subscription-based Windows and SQL Server licenses that entitle them to use Azure Hybrid Benefit. However, if the setting is not applied on eligible resources, Azure continues charging pay-as-you-go rates that already include Microsoft licensing costs. This oversight results in paying twice—once for the on-premises license and once for the built-in Azure license. The inefficiency often goes unnoticed because licensing configurations are not centrally validated or enforced. Enabling AHUB can reduce costs by up to 40% for Windows server VMs and up to 30% for SQL Databases.
When a Dataflow pipeline fails—often due to dependency issues, misconfigurations, or data format mismatches—its worker instances may remain active temporarily until the service terminates them. In some cases, misconfigured jobs, stuck retries, or delayed monitoring can cause workers to continue running for extended periods. These idle workers consume vCPU, memory, and storage resources without performing useful work. The inefficiency is compounded in large or high-frequency batch environments where repeated failures can leave many orphaned workers running concurrently.
Aurora Serverless is designed for workloads with unpredictable or intermittent usage patterns that benefit from automatic scaling. However, when used for databases with constant load, the service’s elasticity offers little advantage and adds cost overhead. Serverless instances run continuously in steady workloads, resulting in persistent ACU billing at a higher effective rate than a provisioned cluster of similar size. In addition, Serverless configurations cannot use Reserved Instances or Savings Plans, missing out on predictable cost reductions available to provisioned Aurora.
In restricted or isolated network environments, Dataflow workers often cannot reach the public internet to download runtime dependencies. To operate securely, organizations build custom worker images that bundle required libraries. However, these images must be manually updated to keep dependencies current. As upstream packages evolve, outdated internal images can cause pipeline errors, execution delays, or total job failures. Each failure wastes worker runtime, increases troubleshooting time, and leads to rebuild cycles that inflate operational and compute costs.
Customers often delay upgrading Aurora clusters due to compatibility concerns or operational overhead. However, when older versions such as MySQL 5.7 or PostgreSQL 11 move into Extended Support, AWS applies automatic surcharges to ensure continued patching. These charges affect all clusters regardless of usage, creating unnecessary cost exposure across both production and non-production environments. For large Aurora fleets, the incremental expense can become significant if upgrades are not proactively managed.
Many organizations continue to run outdated database engines, such as MySQL 5.7 or PostgreSQL 11, beyond their support windows. Beginning in 2024, AWS automatically enrolls these into Extended Support to maintain security updates, adding incremental charges that scale with vCPU count. These costs often appear suddenly, impacting both production and non-production environments. For development and test databases in particular, the charges may outweigh their value, leading to hidden inefficiencies if not addressed promptly.
Many teams publish new Lambda versions frequently (e.g., through CI/CD pipelines) but do not clean up old ones. When SnapStart is enabled, each of these versions retains an active snapshot in the cache, generating ongoing charges. Over time, accumulated unused versions can significantly increase spend without delivering any business value. This problem compounds in environments with high deployment velocity or many functions.
SnapStart reduces cold-start latency, but when configured inefficiently, it can increase costs. High-traffic workloads can trigger frequent snapshot restorations, multiplying costs. Slow initialization code inflates the Init phase, which is now billed at the full rate. Suppressed-init conditions, where functions initialize without enhanced resources, can add further inefficiency if memory or timeout settings are misaligned. Together, these factors can cause SnapStart to deliver higher spend without proportional benefit.
When S3 versioning is enabled but no lifecycle rules are defined for non-current objects, outdated versions accumulate indefinitely. These non-current versions are rarely accessed but continue to incur storage charges. Over time, this leads to significant hidden costs, particularly in buckets with frequent object updates or automated data pipelines. Proper lifecycle management is required to limit or expire obsolete versions.
Many organizations default to storing all EFS data in the Standard class, regardless of how frequently data is accessed. This results in inefficient spend for workloads with significant portions of data that are rarely read. EFS IA and Archive tiers offer lower-cost alternatives for data with low or near-zero access, while Intelligent Tiering can automate placement decisions. Failing to leverage these options wastes storage spend and reduces cost efficiency.
Spot Instances are designed to be short-lived, with frequent interruptions and replacements. When AWS Config continuously records every lifecycle change for these instances, it produces a large number of CIRs. This drives costs significantly higher without delivering meaningful compliance insight, since Spot Instances are typically stateless and non-critical. In environments with heavy Spot usage, Config costs can balloon and exceed the value of tracking these transient resources.
Athena generates a new S3 object for every query result, regardless of whether the output is needed long term. Over time, this leads to uncontrolled growth of the output bucket, especially in environments with repetitive queries such as cost and usage reporting. Many of these files are transient and provide little value once the query is consumed. Without lifecycle rules, organizations pay for unnecessary storage and create clutter in S3.
By default, AWS Config is enabled in continuous recording mode. While this may be justified for production workloads where detailed auditability is critical, it is rarely necessary in non-production environments. Frequent changes in development or testing environments — such as redeploying Lambda functions, ECS tasks, or EC2 instances — generate large volumes of CIRs. This results in disproportionately high costs with minimal benefit to governance or compliance. Switching non-production environments to daily recording reduces CIR volume significantly while maintaining sufficient visibility for tracking changes.
Many organizations keep Datadog’s default log retention settings without evaluating business requirements. Defaults may extend retention far beyond what is useful for troubleshooting, performance monitoring, or compliance. This leads to unnecessary storage and indexing costs, particularly in non-production environments or for logs with limited value after a short period. By adjusting retention per project, environment, or service, organizations can reduce spend while still meeting compliance and operational needs.
AWS Graviton processors are designed to deliver better price-performance than comparable Intel-based instances, often reducing cost by 20–30% at equivalent workload performance. OpenSearch domains running on older Intel-based families consume more spend without providing additional capability. Since Graviton-powered instance types are functionally identical in features and performance for OpenSearch, continuing to run on Intel-based clusters represents unnecessary inefficiency.
When multiple tasks within a workflow are executed on separate job clusters — despite having similar compute requirements — organizations incur unnecessary overhead. Each cluster must initialize independently, adding latency and cost. This results in inefficient resource usage, especially for workflows that could reuse the same cluster across tasks. Consolidating tasks onto a single job cluster where feasible reduces start-up time and avoids duplicative compute charges.
Changing a Google Cloud billing account can unintentionally break existing Marketplace subscriptions. If entitlements are tied to the original billing account, the subscription may fail or become invalid, prompting teams to make urgent, direct purchases of the same services, often at higher list or on-demand rates. These emergency purchases bypass previously negotiated Marketplace pricing and can result in significantly higher short-term costs. The issue is common during reorganizations, mergers, or changes to billing hierarchy and is often not discovered until after costs have spiked.
When Marketplace contracts or subscriptions expire or change without visibility, Azure may automatically continue billing at higher on-demand or list prices. These lapses often go unnoticed due to lack of proactive tracking, ownership, or renewal alerts, resulting in substantial cost increases. The issue is amplified when contract records are siloed across procurement, finance, and engineering teams, with no centralized mechanism to monitor entitlement status or reconcile expected versus actual billing.
In many organizations, AWS Marketplace purchases are lumped into a single consolidated billing line without visibility into individual vendors. This lack of transparency makes it difficult to identify which Marketplace spend is eligible to count toward the EDP cap. As a result, teams may either overspend on direct AWS services to fulfill their commitment unnecessarily or miss the opportunity to right-size new commitments based on existing Marketplace purchases. In both cases, the absence of vendor-level detail hinders optimization.
Azure Marketplace offers two types of listings: transactable and non-transactable. Only transactable purchases contribute toward a customer’s MACC commitment. However, many teams mistakenly assume that all Marketplace spend counts, leading to missed opportunities to burn down commitments and risking budget inefficiencies. Selecting a non-transactable listing, when a transactable equivalent exists, can result in identical services being acquired at higher effective cost due to lost discounts. This confusion is exacerbated when procurement and engineering teams do not coordinate or consult Microsoft's guidance.
Many organizations mistakenly believe that all AWS Marketplace spend automatically contributes to their EDP commitment. In reality, only certain Marketplace transactions, those involving EDP-eligible vendors and transactable SKUs, will count towards a portion of their EDP commitment. This misunderstanding can lead to double counting: forecasting based on the assumption that both native AWS usage and Marketplace purchases will fully draw down the commitment. If the assumptions are incorrect, the organization risks failing to meet its EDP threshold, incurring penalties or losing expected discounts.
Organizations frequently inherit continuous recording by default (e.g., through landing zones) without validating the business need for per-change granularity across all resource types and environments. In change-heavy accounts (ephemeral resources, CI/CD churn, autoscaling), continuous mode drives very high CIR volumes with limited additional operational value. Selecting periodic recording for lower-risk resource types and/or non-production environments can maintain necessary visibility while reducing CIR volume and cost. Recorder settings are account/region scoped, so you can apply continuous in production where required and periodic elsewhere.
AWS Fargate supports both x86 and Graviton2 (ARM64) CPU architectures, but by default, many workloads continue to run on x86. Graviton2 delivers significantly better price-performance, especially for stateless, scale-out container workloads. Teams that fail to configure task definitions with the `ARM64` architecture miss out on meaningful efficiency gains. Because this setting is not enabled automatically and is often overlooked, it results in higher compute costs for functionally equivalent workloads.
S3 buckets configured with SSE-KMS but without Bucket Keys generate a separate KMS request for each object operation. This behavior results in disproportionately high KMS request costs for data-intensive workloads such as analytics, backups, or frequently accessed objects. Bucket Keys allow S3 to cache KMS data keys at the bucket level, reducing the volume of KMS calls and cutting encryption costs—often with no impact on security or performance.
By default, AWS Config can be set to record changes across all supported resource types, including those that change frequently, such as security group rules, IAM role policies, route tables, or network interfaces frequent ephemeral resources in containerized or auto-scaling setupsThese high-churn resources can generate an outsized number of configuration items and inflate costs — especially in dynamic or large-scale environments.
This inefficiency arises when recording is enabled indiscriminately across all resources without evaluating whether the data is necessary. Without targeted scoping, teams may incur large charges for configuration data that provides minimal value, especially in non-production environments.This can also obscure meaningful compliance signals by introducing noise
Audit logs are often retained longer than necessary, especially in environments where the logging destination is not carefully selected. Projects that initially route SQL Audit Logs or other high-volume sources to LAW or Azure Storage may forget to revisit their retention strategy. Without policies in place, logs can accumulate unchecked—particularly problematic with SQL logs, which can generate significant volume. Lifecycle Management Policies in Azure Storage are a key tool for addressing this inefficiency but are often overlooked.
However, tier transitions are not always cost-saving. For example, in cases where log data consists of extremely large numbers of very small files (such as AKS audit logs across many pods), the transaction charges incurred when moving objects between storage tiers may exceed the potential savings from reduced storage rates. In these scenarios, it can be more cost-effective to retain logs in Hot tier until deletion, rather than moving them to lower-cost tiers first.
VPC Flow Logs configured with the ALL filter and delivered to CloudWatch Logs often result in unnecessarily high log ingestion volumes — especially in high-traffic environments. This setup is rarely required for day-to-day monitoring or security use cases but is commonly enabled by default or for temporary debugging and then left in place. As a result, teams incur excessive CloudWatch charges without realizing the logging configuration is misaligned with actual needs.