Unnecessary Use of Embeddings for Simple Retrieval Tasks

CER:

CER-0274

Service Category

Cloud Provider

Azure

Service Name

Azure Cognitive Services

Inefficiency Type

Misapplied Embedding Architecture

Explanation

Embeddings enable semantic retrieval by capturing the meaning of text, while keyword search returns results based on exact or lexical matches. Many Azure workloads—FAQ search, routing, deterministic classification, or structured lookups—achieve the same or better accuracy using simple keyword or metadata filtering. When embeddings are used for these uncomplicated tasks, organizations pay for token-based embedding generation, vector storage, and compute-heavy similarity search without receiving meaningful quality improvements. This inefficiency often occurs when RAG is used automatically rather than intentionally.

‍

Relevant Billing Model

Embedding models are billed per input token. Vector indexing and search operations in Azure AI Search (or other vector stores) incur additional storage and query compute costs. Using embeddings when unnecessary creates avoidable multi-layer cost.

Detection

Identify workloads using embeddings for simple, deterministic retrieval tasks
Review whether keyword search achieves similar accuracy
Evaluate vector index growth and query volume relative to task complexity
Check for embedding pipelines built around static or rarely changing content
Determine whether semantic search was added without a clear functional requirement

Remediation

Replace embeddings with keyword or metadata-based search for simple retrieval tasks
Disable or remove embedding generation pipelines that offer no semantic benefit
Reduce vector index storage for workloads not requiring semantic search
Benchmark retrieval accuracy using simpler search methods before defaulting to embeddings
Periodically review retrieval architectures to prevent unnecessary vector-search adoption

Relevant Documentation

https://learn.microsoft.com/azure/search/vector-search-overview

Submit Feedback