Using High-Cost Models for Low-Complexity Tasks

Taylor Houck

CER:

CER-0277

Service Category

Cloud Provider

Azure

Service Name

Azure Cognitive Services

Inefficiency Type

Overpowered Model Selection

Explanation

Some workloads — such as text classification, keyword extraction, intent detection, routing, or lightweight summarization — do not require the capabilities of the most advanced model families. When high-cost models are used for these simple tasks, organizations pay elevated token rates for work that could be handled effectively by more efficient, lower-cost models. This mismatch typically arises from defaulting to a single model for all tasks or not periodically reviewing model usage patterns across applications.

Relevant Billing Model

On-demand Azure OpenAI deployments are billed per input and output token. Larger, more capable models (e.g., GPT-4 class) have significantly higher cost per token. Choosing a model that exceeds workload requirements increases spend without improving output quality.

Detection

Identify workloads executing simple tasks that do not require advanced reasoning or generative abilities
Review token consumption and model selection to determine whether premium models are being used broadly
Assess whether output quality or accuracy would remain sufficient with smaller, task-optimized models
Evaluate whether development teams rely on a “one model fits all” pattern across multiple applications

Remediation

Match each workload to the smallest Azure OpenAI model family that satisfies accuracy, latency, and quality needs
Use task-optimized models (e.g., embeddings, lightweight classification models) instead of general-purpose generative models
Establish model selection guidelines to prevent high-cost models from being used as defaults
Periodically re-evaluate applications to ensure model choices align with evolving model offerings and workload complexity

Relevant Documentation

https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models

Submit Feedback