Embedding-based retrieval enables semantic matching even when keywords differ. But many Databricks workloads—catalog lookups, metadata search, deterministic classification, or fixed-rule routing—do not require semantic understanding. When embeddings are used anyway, teams incur DBU cost for embedding generation, additional storage for vector columns or indexes, and more expensive similarity-search compute. This often stems from defaulting to a RAG approach rather than evaluating whether a simpler retrieval mechanism would perform equally well.
Embedding generation consumes model inference compute (DBUs), and vector indexing/search consumes additional compute and storage. Using embeddings where they are unnecessary leads directly to elevated DBU usage and storage cost.