Unnecessary Use of Embeddings for Simple Retrieval Tasks

CER:

CER-0272

Service Category

Cloud Provider

AWS

Service Name

AWS Bedrock

Inefficiency Type

Misapplied Embedding Architecture

Explanation

Embeddings enable semantic search by converting text into vectors that capture meaning. Keyword or metadata search performs exact or simple lexical matches. Many workloads—FAQ lookup, helpdesk routing, short product lookups, or rule-based filtering—do not benefit from semantic search. When embeddings are used anyway, organizations pay for embedding generation, vector storage, and similarity search without gaining accuracy or relevance improvements. This often happens when teams adopt RAG “by default” for problems that do not require semantic understanding.

Relevant Billing Model

Embedding requests are billed per input token or per 1,000 tokens depending on the model provider. Downstream vector database queries and storage incur additional costs. Using embeddings unnecessarily increases spend across inference, storage, and retrieval layers.

Detection

Identify Bedrock workloads generating embeddings for simple keyword-matching scenarios
Review accuracy differences between embedding-based search and basic text search
Assess vector index growth driven by unnecessarily large embedding pipelines
Look for RAG implementations built for content that rarely changes
Evaluate whether embeddings were introduced without a clear semantic requirement

Remediation

Replace embeddings with keyword or metadata-based search when semantic similarity is not required
Remove embedding-generation pipelines for deterministic or low-complexity tasks
Decommission or reduce vector storage supporting non-semantic retrieval
Validate whether simpler retrieval methods meet accuracy needs before using embeddings
Periodically reassess RAG and vector-search usage to prevent unnecessary expansion

Relevant Documentation

https://docs.aws.amazon.com/bedrock/latest/userguide/embeddings.html

Submit Feedback