Embeddings enable semantic retrieval by capturing the meaning of text, while keyword search returns results based on exact or lexical matches. Many Azure workloads—FAQ search, routing, deterministic classification, or structured lookups—achieve the same or better accuracy using simple keyword or metadata filtering. When embeddings are used for these uncomplicated tasks, organizations pay for token-based embedding generation, vector storage, and compute-heavy similarity search without receiving meaningful quality improvements. This inefficiency often occurs when RAG is used automatically rather than intentionally.
Embedding models are billed per input token. Vector indexing and search operations in Azure AI Search (or other vector stores) incur additional storage and query compute costs. Using embeddings when unnecessary creates avoidable multi-layer cost.