Azure releases newer OpenAI models that provide better performance and cost characteristics compared to older generations. When workloads remain on outdated model versions, they may consume more tokens to produce equivalent output, run slower, or miss out on quality improvements. Because customers pay per token, using an older model can lead to unnecessary spending and reduced value. Aligning deployments to the most current, efficient model types helps reduce spend and improve application performance.
On-demand Azure OpenAI deployments are billed per input and output token. Newer models often offer lower cost per processed token, higher throughput, and reduced latency. Continuing to run older models can increase token usage and degrade cost efficiency.