While many AWS customers have migrated EC2 workloads to Graviton to reduce costs, Lambda functions often remain on the default x86 architecture. AWS Graviton2 (ARM) offers lower pricing and equal or better performance for most supported runtimes — yet adoption remains uneven due to legacy defaults or lack of awareness. Continuing to run eligible Lambda functions on x86 leads to unnecessary spending. The migration requires minimal configuration changes and can be verified through benchmarking and workload testing.
Many EC2 workloads—such as development environments, test jobs, stateless services, and data processing pipelines—can tolerate interruptions and do not require the reliability of On-Demand pricing. Using On-Demand instances in these scenarios drives up cost without adding value. Spot Instances offer significantly lower pricing and are well-suited to workloads that can handle restarts, retries, or fluctuations in capacity. Without evaluating workload tolerance and adjusting pricing models accordingly, organizations risk consistently overpaying for compute.
Databricks supports AWS Graviton-based instances for most workloads, including Spark jobs, data engineering pipelines, and interactive notebooks. These instances offer significant cost advantages over traditional x86-based VMs, with comparable or better performance in many cases. When teams default to legacy instance types, they miss an easy opportunity to reduce compute spend. Unless workloads have known compatibility issues or specialized requirements, Graviton should be the default instance family used in Databricks Clusters.
In Databricks, on-demand instances provide reliable performance but come at a premium cost. For non-production workloads—such as development, testing, or exploratory analysis—high availability is often unnecessary. Spot instances provide equivalent performance at a lower price, with the tradeoff of occasional interruptions. If teams default to on-demand usage in lower environments, they may be incurring unnecessary compute costs. Using compute policies to limit on-demand usage ensures greater consistency and efficiency across environments.
Databricks users can select from a wide range of instance types for cluster driver and worker nodes. Without guardrails, teams may choose high-cost configurations (e.g., 16xlarge nodes) that exceed workload requirements. This results in inflated costs with little performance benefit. To reduce this risk, administrators can use compute policies to define acceptable node types and enforce size limits across the workspace.
Interactive clusters are often left running between periods of active use. To mitigate idle charges, Databricks provides an “autotermination” setting that shuts down clusters after a period of inactivity. However, if the termination period is set too high, or if policies do not enforce reasonable thresholds, idle clusters can persist for long durations without performing any work—resulting in wasted compute spend. Lowering the termination window reduces exposure to idle time while preserving user flexibility.
Interactive clusters are intended for development and ad-hoc analysis, remaining active until manually terminated. When used to run scheduled jobs or production workflows, they often stay idle between executions—leading to unnecessary infrastructure and DBU costs. Job clusters are designed for ephemeral, single-job execution and automatically terminate upon completion, reducing runtime and isolating workloads. Using interactive clusters for production jobs leads to cost inefficiencies and weaker workload boundaries.
Workloads with consistently low CPU and memory usage may no longer serve active traffic or scheduled tasks, but continue reserving resources within the cluster. These idle deployments often remain after project migrations, feature deprecations, or experimentation. Removing inactive workloads allows node groups to scale down, reducing infrastructure costs without impacting active services.
Clusters that no longer run active workloads but remain provisioned continue incurring hourly control plane costs and may also maintain associated infrastructure like node groups or VPC components. Inactive clusters often persist after environment decommissioning, project shutdowns, or migrations. Decommissioning unused clusters eliminates unnecessary operational costs and simplifies infrastructure management.
When an EC2 instance is dedicated primarily to internet-facing traffic, regional differences in data transfer pricing can drive a substantial portion of total costs. Hosting such workloads in a region with higher egress rates leads to elevated expenses without improving performance. Migrating the workload to a lower-cost region can yield significant savings while maintaining equivalent service quality, especially when no strict latency or compliance requirements dictate the original location.