Suboptimal Azure OpenAI Model Type

Ariel Lichterman

CER:

CER-0254

Service Category

Cloud Provider

Azure

Service Name

Azure Cognitive Services

Inefficiency Type

Outdated Model Selection

Explanation

Azure releases newer OpenAI models that provide better performance and cost characteristics compared to older generations. When workloads remain on outdated model versions, they may consume more tokens to produce equivalent output, run slower, or miss out on quality improvements. Because customers pay per token, using an older model can lead to unnecessary spending and reduced value. Aligning deployments to the most current, efficient model types helps reduce spend and improve application performance.

Relevant Billing Model

On-demand Azure OpenAI deployments are billed per input and output token. Newer models often offer lower cost per processed token, higher throughput, and reduced latency. Continuing to run older models can increase token usage and degrade cost efficiency.

Detection

Review Azure OpenAI deployments to identify workloads using older or deprecated model versions
Assess token consumption patterns to determine whether newer models could achieve the same results more efficiently
Evaluate latency or performance issues that may be linked to older model behavior
Check Azure’s model lifecycle and release notes to confirm whether a newer recommended model family exists

Remediation

Migrate workloads to the latest suitable Azure OpenAI model that provides improved efficiency and performance
Establish a periodic review process to ensure deployed models are aligned with current Azure model offerings
Incorporate model lifecycle awareness into architecture standards so workloads are upgraded as new versions become available
Validate compatibility and output quality after migration to ensure a smooth transition to newer models

Relevant Documentation

https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models

Submit Feedback