Cloud Provider
Databricks
Inefficiency Type
Clear filters
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Showing
1234
out of
1234
inefficiencis
Filter
:
Filter
x
Missing Partition Pruning in Delta Lake Table Queries
Databases
Cloud Provider
AWS
Service Name
Databricks
Inefficiency Type
Inefficient Configuration

When Delta Lake tables are partitioned by specific columns — such as date, region, or tenant identifier — the query engine can use partition pruning to limit data scans to only the relevant subset of files. However, when queries against these partitioned tables omit filter predicates on partition columns, the engine is forced to perform a full table scan across all partitions. This means the cluster reads every data file in the table regardless of how much data the query actually needs, directly inflating both execution time and Databricks Unit (DBU) consumption.

This pattern is especially common in several scenarios: legacy SQL queries written before tables were partitioned, dynamically generated queries from applications or BI tools that do not incorporate partition column awareness, and ad-hoc exploratory queries by analysts unfamiliar with the table's partitioning strategy. On large time-series datasets, the difference can be dramatic — a query that should scan only a few gigabytes of recent data may instead process terabytes across the entire table history. Because Databricks bills DBUs per second, a query that runs significantly longer due to scanning unnecessary data consumes proportionally more DBUs, compounding the waste across both the Databricks platform charges and the underlying cloud infrastructure costs.

This inefficiency is distinct from tables that lack partitioning entirely. Here, the partitioning infrastructure exists and is correctly configured, but queries fail to leverage it — making the investment in partitioning effectively wasted while still incurring full-scan costs.

Inefficient Use of Photon Engine in Azure Databricks
Compute
Cloud Provider
Azure
Service Name
Databricks
Inefficiency Type
Suboptimal Configuration

Photon is optimized for SQL workloads, delivering significant speedups through vectorized execution and native C++ performance. However, Photon only accelerates workloads that use compatible operations and data patterns. If a workload includes unsupported functions, unoptimized joins, or falls back to interpreted execution, Photon may be silently bypassed — even on a Photon-enabled cluster. In this case, users are billed at a premium DBU rate while receiving no meaningful speed or efficiency gain. This inefficiency typically arises when teams enable Photon globally without validating workload compatibility or updating their pipelines to follow Photon best practices. The result is higher costs with no corresponding benefit — a classic case of configuration drift outpacing optimization discipline.

Lack of Functional Cost Attribution in Databricks Workloads
Other
Cloud Provider
Databricks
Service Name
Databricks
Inefficiency Type
Visibility Gap

Databricks cost optimization begins with visibility. Unlike traditional IaaS services, Databricks operates as an orchestration layer spanning compute, storage, and execution — but its billing data often lacks granularity by workload, job, or team. This creates a visibility gap: costs fluctuate without clear root causes, ownership is unclear, and optimization efforts stall due to lack of actionable insight. When costs are not attributed functionally — for example, to orchestration (query/job DBUs), compute (cloud VMs), storage, or data transfer — it becomes difficult to pinpoint what’s driving spend or where improvements can be made. As a result, inefficiencies persist not due to a single misconfiguration, but because the system lacks the structure to surface them.

There are no inefficiency matches the current filters.