While stopping an RDS instance reduces runtime cost, AWS enforces a 7-day limit on stopped state. After this period, the instance is automatically restarted and resumes incurring compute charges — even if the database is still not needed. This creates waste in cases where teams intended to pause the environment but failed to manage its lifecycle beyond the 7-day window. Without proper automation or teardown workflows, stopped RDS instances silently become active and billable again. The best practice for long-term inactivity is to snapshot the database and delete the instance entirely. If the instance must remain available for fast recovery, automation should be in place to re-stop it upon restart.
While many AWS customers have migrated EC2 workloads to Graviton to reduce costs, Lambda functions often remain on the default x86 architecture. AWS Graviton2 (ARM) offers lower pricing and equal or better performance for most supported runtimes — yet adoption remains uneven due to legacy defaults or lack of awareness. Continuing to run eligible Lambda functions on x86 leads to unnecessary spending. The migration requires minimal configuration changes and can be verified through benchmarking and workload testing.
EFS file systems that are no longer attached to any running services — such as EC2 instances or Lambda functions — continue to incur storage charges. This often occurs after workloads are decommissioned but the file system is left behind. A quick indicator of this state is when the EFS file system has no mount targets configured. Without active usage or connection, these orphaned file systems represent pure cost with no functional value. Unlike block storage, EFS does not require an attached instance to incur billing, making it easy for unused resources to go unnoticed.
For Premium SSD and Standard SSD disks 513 GiB or larger, Azure now offers the option to enable Performance Plus — unlocking higher IOPS and MBps at no extra cost. Many environments that previously required custom performance settings continue to pay for additional throughput unnecessarily. By not enabling Performance Plus on eligible disks, organizations miss a straightforward opportunity to reduce disk spend while maintaining or improving performance. The feature is opt-in and must be explicitly enabled on each qualifying disk.
Each Azure VM size has a defined limit for total disk IOPS and throughput. When high-performance disks (e.g., Premium SSDs with high IOPS capacity) are attached to low-tier VMs, the disk’s performance capabilities may exceed what the VM can consume. This results in paying for performance that the VM cannot access. For example, attaching a large Premium SSD to a B-series VM will not provide the expected performance because the VM cannot deliver that level of throughput. Without aligning disk selection with VM limits, organizations incur unnecessary storage costs with no corresponding performance benefit.
Azure WAF configurations attached to Application Gateways can persist after their backend pool resources have been removed — often during environment reconfiguration or application decommissioning. In these cases, the WAF is no longer serving any functional purpose but continues to incur fixed hourly costs. Because no traffic is routed and no applications are protected, the WAF is effectively inactive. These orphaned WAFs are easy to overlook without regular cleanup processes and can quietly accumulate unnecessary charges over time.
Many EC2 workloads—such as development environments, test jobs, stateless services, and data processing pipelines—can tolerate interruptions and do not require the reliability of On-Demand pricing. Using On-Demand instances in these scenarios drives up cost without adding value. Spot Instances offer significantly lower pricing and are well-suited to workloads that can handle restarts, retries, or fluctuations in capacity. Without evaluating workload tolerance and adjusting pricing models accordingly, organizations risk consistently overpaying for compute.
Databricks supports AWS Graviton-based instances for most workloads, including Spark jobs, data engineering pipelines, and interactive notebooks. These instances offer significant cost advantages over traditional x86-based VMs, with comparable or better performance in many cases. When teams default to legacy instance types, they miss an easy opportunity to reduce compute spend. Unless workloads have known compatibility issues or specialized requirements, Graviton should be the default instance family used in Databricks Clusters.
In Databricks, on-demand instances provide reliable performance but come at a premium cost. For non-production workloads—such as development, testing, or exploratory analysis—high availability is often unnecessary. Spot instances provide equivalent performance at a lower price, with the tradeoff of occasional interruptions. If teams default to on-demand usage in lower environments, they may be incurring unnecessary compute costs. Using compute policies to limit on-demand usage ensures greater consistency and efficiency across environments.
Databricks users can select from a wide range of instance types for cluster driver and worker nodes. Without guardrails, teams may choose high-cost configurations (e.g., 16xlarge nodes) that exceed workload requirements. This results in inflated costs with little performance benefit. To reduce this risk, administrators can use compute policies to define acceptable node types and enforce size limits across the workspace.