Azure App Service Plans define the compute resources allocated to web applications and are billed continuously based on their pricing tier — regardless of whether the hosted apps are actively serving traffic. In non-production environments such as development, testing, or staging, workloads typically follow predictable usage patterns aligned with business hours. When these plans remain provisioned at higher-cost tiers around the clock, organizations pay premium rates for compute capacity that sits idle during evenings, weekends, and holidays.
A common misconception is that stopping the apps within a plan will halt charges. In reality, the App Service Plan itself is the billing container, and charges accrue as long as the plan exists at a dedicated tier — even with all apps stopped or deleted. Simply stopping apps provides no cost relief. Instead, the plan's tier must be actively changed to a lower-cost option during periods of inactivity to realize savings. This temporal tier-switching pattern is distinct from scaling out (adjusting instance count) or right-sizing (choosing a permanently smaller tier), and is particularly effective for non-production workloads where brief interruptions during tier transitions are acceptable.
Because higher tiers such as Premium or Standard carry significantly higher per-hour rates than Basic tier, leaving these plans unchanged during extended idle periods represents a significant and avoidable expense. Organizations with multiple non-production App Service Plans can accumulate substantial waste if this pattern is not addressed.
When organizations purchase AWS Savings Plans during periods of elevated AI inference demand — such as experimentation phases, feature launches, or early adoption surges — the committed hourly spend may significantly exceed what is needed once workloads stabilize. GPU-backed inference clusters running on high-cost instance families can drive substantial compute consumption during these peaks, and if that peak usage is used as the baseline for commitment sizing, the resulting Savings Plan will be oversized relative to steady-state demand. Because Savings Plans are billed as a fixed hourly dollar commitment for the entire term, any unused portion in a given hour is forfeited — it cannot be carried over, recouped, or applied to future hours.
This pattern is especially costly for AI inference workloads because GPU-accelerated instances carry significantly higher hourly rates than general-purpose compute, amplifying the financial impact of each underutilized hour. The problem compounds when inference workloads shift between instance families, regions, or deployment architectures over time — a common occurrence as teams optimize models, adopt newer hardware generations, or consolidate serving infrastructure. EC2 Instance Savings Plans, which are scoped to a specific instance family and region, are particularly vulnerable to these shifts. Critically, Savings Plans cannot be canceled, modified, or sold on any marketplace once purchased, making the commitment irrevocable for the full term with only a narrow return window available under limited conditions.
The net result is a sustained gap between committed spend and actual covered usage, eroding the discount benefit that justified the commitment in the first place. In cases of sustained underutilization, the effective discount achieved by the Savings Plan can be materially reduced, undermining the expected financial benefit of the commitment.
When external Delta tables are dropped from Databricks Unity Catalog or the legacy Hive metastore, only the table metadata is removed — the underlying data files in cloud object storage (such as S3, ADLS, or GCS) remain untouched and continue to incur per-GB-month storage charges. This behavior is by design: external tables decouple metadata from data lifecycle management, meaning Databricks explicitly does not delete the underlying storage when an external table is dropped. The result is orphaned storage — files that no longer have any catalog reference, are not consumed by any downstream pipeline, and deliver no business value, yet continue to accumulate charges indefinitely.
This pattern is particularly prevalent in environments using medallion architecture (bronze/silver/gold layers), where tables are frequently recreated during pipeline evolution, schema experimentation, or migration between environments. Development and test workloads compound the problem, as teams routinely create and abandon external table references without cleaning up the associated storage. Unlike managed tables in Unity Catalog — which have a retention period with recovery capability before automatic deletion — external tables offer no such safety net. The orphaned storage is structurally invisible to standard cost dashboards because it appears as generic object storage charges, not as Databricks-specific line items. Over time, this silent accumulation can represent a meaningful share of an organization's total storage spend.
Importantly, Databricks VACUUM operations do not address this pattern. VACUUM cleans up old file versions within active Delta tables, but it cannot act on storage paths that have been completely disconnected from catalog metadata through external table drops. The only way to reclaim this storage is to manually identify and delete the orphaned files in cloud storage.
This inefficiency occurs when Azure Load Balancers remain provisioned after the backend workloads they supported have been scaled down, stopped, or decommissioned. This is common in non-production environments where virtual machines are shut down outside business hours, but the associated load balancers are left in place. Even when no meaningful traffic is flowing, the load balancer continues to incur base charges, resulting in ongoing cost without delivering value.
This inefficiency occurs when an RDS database instance is deleted but its manual snapshots or retained backups remain. Unlike automated backups tied to a live instance, these backups persist independently and continue generating storage costs despite no longer supporting any active database. This is distinct from excessive retention on active databases and typically arises from incomplete cleanup during decommissioning.
This inefficiency occurs when analysts use SELECT * (reading more columns than needed) and/or rely on LIMIT as a cost-control mechanism. In BigQuery, projecting excess columns increases the amount of data read and can materially raise query cost, particularly on wide tables and frequently-run queries. Separately, applying LIMIT to a query does not inherently reduce bytes processed for non-clustered tables; it mainly caps the result set returned. The “LIMIT saves cost” assumption is only sometimes true on clustered tables, where BigQuery may be able to stop scanning earlier once enough clustered blocks have been read.
This inefficiency occurs when an App Service Plan is sized larger than required for the applications it hosts. Plans are often provisioned conservatively to handle anticipated peak demand and are not revisited after workloads stabilize. Because pricing is tied to the plan’s SKU rather than real-time usage, oversized plans continue to incur higher costs even when CPU and memory utilization remain consistently low.
This inefficiency occurs when an Azure Virtual WAN hub is provisioned with more capacity than required to support real network traffic. Because hub costs scale with the number of configured scale units, overprovisioned hubs continue to incur higher charges even when traffic levels remain consistently low. This commonly happens when hubs are sized for peak or anticipated demand that never materializes, or when traffic patterns change over time without corresponding capacity adjustments.
This inefficiency occurs when a function has steady, high-volume traffic (or predictable load) but continues running on default Lambda pricing, where costs scale with execution duration. Lambda Managed Instances runs Lambda on EC2 capacity managed by Lambda and supports multi-concurrent invocations within the same execution environment, which can materially improve utilization for suitable workloads (often IO-heavy services). For these steady-state patterns, shifting from duration-based billing to instance-based billing (and potentially leveraging EC2 pricing options like Savings Plans or Reserved Instances) can reduce total cost—while keeping the Lambda programming model. Savings are workload-dependent and not guaranteed.
This inefficiency occurs when Azure SQL Managed Instances continue running on legacy General Purpose or Business Critical tiers despite the availability of the next-gen General Purpose tier. The newer tier enables more granular scaling of vCPU, memory, and storage, allowing workloads to better match actual resource needs. In many cases, workloads running on Business Critical—or overprovisioned legacy General Purpose—do not require the premium performance or architecture of those tiers and could achieve equivalent outcomes at lower cost by moving to next-gen General Purpose.