Submit feedback on
Using High-Cost Models for Low-Complexity Tasks
We've received your feedback.
Thanks for reaching out!
Oops! Something went wrong while submitting the form.
Close
Using High-Cost Models for Low-Complexity Tasks
CER:
Service Category
AI
Cloud Provider
Azure
Service Name
Azure Cognitive Services
Inefficiency Type
Overpowered Model Selection
Explanation

Some workloads — such as text classification, keyword extraction, intent detection, routing, or lightweight summarization — do not require the capabilities of the most advanced model families. When high-cost models are used for these simple tasks, organizations pay elevated token rates for work that could be handled effectively by more efficient, lower-cost models. This mismatch typically arises from defaulting to a single model for all tasks or not periodically reviewing model usage patterns across applications.

Relevant Billing Model

On-demand Azure OpenAI deployments are billed per input and output token. Larger, more capable models (e.g., GPT-4 class) have significantly higher cost per token. Choosing a model that exceeds workload requirements increases spend without improving output quality.

Detection
  • Identify workloads executing simple tasks that do not require advanced reasoning or generative abilities
  • Review token consumption and model selection to determine whether premium models are being used broadly
  • Assess whether output quality or accuracy would remain sufficient with smaller, task-optimized models
  • Evaluate whether development teams rely on a “one model fits all” pattern across multiple applications
Remediation
  • Match each workload to the smallest Azure OpenAI model family that satisfies accuracy, latency, and quality needs
  • Use task-optimized models (e.g., embeddings, lightweight classification models) instead of general-purpose generative models
  • Establish model selection guidelines to prevent high-cost models from being used as defaults
  • Periodically re-evaluate applications to ensure model choices align with evolving model offerings and workload complexity
Submit Feedback