Oversized Worker or Driver Nodes in Databricks Clusters
Matt Weingarten
Service Category
Compute
Cloud Provider
Databricks
Service Name
Databricks Clusters
Inefficiency Type
Overprovisioned Resource
Explanation

Databricks users can select from a wide range of instance types for cluster driver and worker nodes. Without guardrails, teams may choose high-cost configurations (e.g., 16xlarge nodes) that exceed workload requirements. This results in inflated costs with little performance benefit. To reduce this risk, administrators can use compute policies to define acceptable node types and enforce size limits across the workspace.

Relevant Billing Model

Databricks costs are driven by:

  • Databricks Units (DBUs): Billed per hour based on node type
  • Cloud Infrastructure Charges: Cost of underlying VMs, billed per second or minute

Larger node types (e.g., high-memory or high-I/O VMs) incur significantly higher charges. Oversizing clusters without justification leads to unnecessary DBU and infrastructure costs.

Detection
  • Review all cluster configurations to identify usage of large or high-cost instance types
  • Query system tables for driver and worker node types across clusters
  • Check whether compute policies are in place to limit allowable node sizes
  • Engage with workload owners to confirm whether large instances are justified based on workload characteristics
Remediation
  • Define and enforce compute policies that restrict driver and worker node types to appropriate sizes
  • Reconfigure existing clusters using oversized nodes to use smaller, cost-effective alternatives
  • Allow exceptions only for workloads that demonstrably require high-performance nodes