Databricks users can select from a wide range of instance types for cluster driver and worker nodes. Without guardrails, teams may choose high-cost configurations (e.g., 16xlarge nodes) that exceed workload requirements. This results in inflated costs with little performance benefit. To reduce this risk, administrators can use compute policies to define acceptable node types and enforce size limits across the workspace.
Interactive clusters are often left running between periods of active use. To mitigate idle charges, Databricks provides an “autotermination” setting that shuts down clusters after a period of inactivity. However, if the termination period is set too high, or if policies do not enforce reasonable thresholds, idle clusters can persist for long durations without performing any work—resulting in wasted compute spend. Lowering the termination window reduces exposure to idle time while preserving user flexibility.
Standard SSD disks can often be replaced with Premium SSD v2 disks, offering enhanced IOPS, throughput, and durability at competitive or lower pricing. For workloads that require moderate to high performance but are currently constrained by Standard SSD capabilities, migrating to Premium SSD v2 improves both performance and cost efficiency without significant operational overhead.
Interactive clusters are intended for development and ad-hoc analysis, remaining active until manually terminated. When used to run scheduled jobs or production workflows, they often stay idle between executions—leading to unnecessary infrastructure and DBU costs. Job clusters are designed for ephemeral, single-job execution and automatically terminate upon completion, reducing runtime and isolating workloads. Using interactive clusters for production jobs leads to cost inefficiencies and weaker workload boundaries.
Disks attached to VMs that have been stopped for an extended period, particularly when showing no read or write activity, may indicate abandoned infrastructure or obsolete resources. Retaining these disks without validation leads to unnecessary monthly storage costs. Reviewing and cleaning up inactive disks helps optimize spend and maintain storage hygiene.
Workloads using legacy Premium SSD managed disks may be eligible for migration to Premium SSD v2, which delivers equivalent or improved performance characteristics at a lower cost. Premium SSD v2 decouples disk size from performance metrics like IOPS and throughput, enabling more granular cost optimization. Additionally, Premium SSD disks are often overprovisioned in size—for example, a P40 disk with more IOPS and capacity than the workload requires—resulting in inflated storage costs. Rightsizing includes both transitioning to v2 and resizing to smaller SKUs (e.g., P40 → P20) based on observed utilization. Failure to address either form of overprovisioning leads to persistent waste.
Tables with no read or write activity often represent deprecated applications, obsolete telemetry, or abandoned development artifacts. Retaining inactive tables increases storage costs and operational complexity. Regularly auditing and cleaning up unused tables helps maintain a streamlined, cost-effective storage environment.
Files that show no read or write activity over an extended period often indicate redundant or abandoned data. Keeping inactive files in higher-cost storage classes unnecessarily increases monthly spend. Implementing proactive archiving, deletion workflows, and safety features like Blob Soft Delete or Versioning improves cost efficiency while protecting against accidental data loss.
Non-production environments such as development, testing, or staging often do not require the high availability, failover capabilities, and premium storage performance offered by the Business Critical tier. Running these workloads on Business Critical unnecessarily inflates costs. Choosing a lower-cost tier like General Purpose typically provides sufficient performance and availability for non-production use cases, significantly reducing ongoing database expenses.
Buckets without Autoclass enabled can accumulate infrequently accessed data in more expensive storage classes, inflating monthly costs. Enabling Autoclass allows GCS to automatically move objects to lower-cost tiers based on observed access behavior, optimizing storage costs without manual lifecycle policy management. Activating Autoclass reduces operational overhead while maintaining seamless access to objects across storage classes.