Excessive Data Scanned Due to Unpartitioned Tables in BigQuery

Service Category

Cloud Provider

GCP

Service Name

Inefficiency Type

Suboptimal Configuration

Explanation

If a table is not partitioned by a relevant column (typically a timestamp), every query scans the entire dataset, even if filtering by date. This leads to: * High costs per query * Long execution times * Inefficient use of resources when querying recent or small subsets of data This inefficiency is especially common in: * Event or log data stored in raw, unpartitioned form Historical data migrations without schema optimization * Workloads developed without awareness of BigQuery’s scanning model

Relevant Billing Model

BigQuery charges primarily based on: * The amount of data scanned per query (on-demand pricing) * Alternatively, flat-rate slots (for enterprises with high query volumes) When using on-demand pricing, scanning large unpartitioned tables dramatically increases cost—even for queries targeting small slices of data.

Detection

Review frequently queried tables without partitioning enabled
Identify queries that filter by date but scan full tables
Evaluate cost per query for common lookups on historical tables
Inspect schema definitions for missing partition and clustering configurations
Analyze cost spikes related to ad-hoc or dashboard queries on large datasets

Remediation

Enable time-based partitioning on large fact or event tables
Retrofit existing tables with ingestion- or column-based partitioning
Cluster tables by frequently filtered fields (e.g., customer ID) to reduce scan volume
Educate data teams on query best practices and partition-aware schema design
Monitor top-cost queries and prioritize optimization of high-volume datasets

Relevant Documentation

Partitioned Tables in BigQuery Best Practices for Controlling Costs

Submit Feedback