Unnecessary Use of Embeddings for Simple Retrieval Tasks

CER:

CER-0271

Service Category

Cloud Provider

Databricks

Service Name

Databricks Vector Search

Inefficiency Type

Misapplied Embedding Architecture

Explanation

Embedding-based retrieval enables semantic matching even when keywords differ. But many Databricks workloads—catalog lookups, metadata search, deterministic classification, or fixed-rule routing—do not require semantic understanding. When embeddings are used anyway, teams incur DBU cost for embedding generation, additional storage for vector columns or indexes, and more expensive similarity-search compute. This often stems from defaulting to a RAG approach rather than evaluating whether a simpler retrieval mechanism would perform equally well.

Relevant Billing Model

Embedding generation consumes model inference compute (DBUs), and vector indexing/search consumes additional compute and storage. Using embeddings where they are unnecessary leads directly to elevated DBU usage and storage cost.

Detection

Identify pipelines generating embeddings for content that rarely changes or has deterministic lookup paths
Compare search accuracy using keyword vs. vector retrieval
Review DBU usage for embedding-generation workloads
Assess vector index size and query volume relative to task complexity
Look for RAG architectures implemented without clear semantic justification

Remediation

Replace embeddings with keyword or metadata-based search for simple or deterministic tasks
Disable or remove embedding pipelines that do not provide semantic benefit
Reduce vector index storage where semantic retrieval is unnecessary
Benchmark retrieval accuracy before defaulting to embedding-based solutions
Periodically audit ML/AI workloads to prevent embedding overuse

Relevant Documentation

https://docs.databricks.com/en/generative-ai/search/vector-search.html

Submit Feedback