Submit feedback on
Fixed Instance Count on Virtual Machine Scale Set Without Autoscaling
We've received your feedback.
Thanks for reaching out!
Oops! Something went wrong while submitting the form.
Close
Fixed Instance Count on Virtual Machine Scale Set Without Autoscaling
Aaran Bhambra
CER:

CER-0314

Service Category
Compute
Cloud Provider
Azure
Service Name
Azure Virtual Machine Scale Sets
Inefficiency Type
Inefficient Configuration
Explanation

Azure Virtual Machine Scale Sets can operate in two modes: manual scaling with a fixed instance count, or autoscaling with dynamic instance counts that respond to demand. When a scale set is configured with manual scaling, it maintains the same number of VM instances at all times — regardless of whether those instances are actively processing workload. Every provisioned instance continues to incur per-second compute charges, meaning the organization pays for full capacity even during off-peak hours, weekends, or seasonal lulls when only a fraction of that capacity is needed.

This pattern is especially wasteful for workloads with variable demand — web applications with daily traffic cycles, batch processing jobs that run at specific intervals, or services with clear seasonal peaks. If a scale set is sized for peak demand but runs at that capacity around the clock, the gap between provisioned resources and actual utilization translates directly into unnecessary spend. Microsoft explicitly identifies autoscaling as a mechanism to reduce scale set costs by running only the number of instances required to meet current demand.

There are legitimate reasons to maintain fixed capacity — stateful applications that cannot tolerate dynamic instance changes, workloads with licensing constraints tied to specific instance counts, or scenarios where consistent performance without scale-up latency is critical. However, many scale sets running at fixed capacity do so simply because autoscaling was never configured, not because it was deliberately excluded. Identifying and addressing these cases represents a significant cost optimization opportunity.

Relevant Billing Model

Azure Virtual Machine Scale Sets carry no incremental service charge — billing is based entirely on the underlying compute, storage, and networking resources consumed by the deployed VM instances:

  • Each VM instance in the scale set is billed per-second while running, with charges determined by the VM size (CPU, memory, and disk configuration)
  • Billing continues for all provisioned instances regardless of whether they are actively processing workload or sitting idle
  • When VMs are stopped and deallocated, compute charges cease, but attached storage resources such as managed disks continue to incur charges

With a fixed instance count, total compute cost equals the number of instances multiplied by the per-instance rate multiplied by hours running — a constant figure that does not flex with actual demand. Autoscaling reduces this by scaling in during low-demand periods, so organizations pay only for the capacity they need at any given time.

Detection
  • Identify Virtual Machine Scale Sets configured with manual scaling mode (fixed instance count) rather than autoscale policies
  • Review compute utilization across all instances in each scale set over a representative period to determine whether capacity consistently exceeds actual demand
  • Assess whether scale sets exhibit variable demand patterns — such as daily, weekly, or seasonal traffic cycles — that would benefit from dynamic scaling
  • Evaluate whether the workload running on the scale set is stateless and suitable for horizontal autoscaling, or stateful with constraints that require fixed capacity
  • Confirm whether scale sets with autoscale rules configured have minimum and maximum instance counts set to the same value, which effectively disables autoscaling
  • Examine historical utilization trends to identify scale sets where average resource consumption remains consistently low relative to provisioned capacity
Remediation
  • Enable autoscaling on scale sets running stateless or horizontally scalable workloads, defining appropriate minimum and maximum instance count boundaries based on observed demand patterns
  • Configure metric-based autoscale rules that scale out during periods of high demand and scale in when utilization drops, ensuring the scale set adjusts capacity dynamically
  • For workloads with predictable usage cycles (such as business hours versus nights and weekends), implement schedule-based autoscaling to proactively adjust instance counts ahead of known demand changes
  • Set appropriate cooldown periods between scaling actions to prevent rapid, unnecessary scale-in and scale-out oscillations that could affect application stability
  • For stateful applications that cannot support dynamic scaling, evaluate whether session state can be externalized to enable autoscaling, or consider alternative cost optimization strategies such as reserved instances for the baseline capacity
  • Establish periodic reviews of scale set configurations and utilization to ensure autoscale policies remain aligned with evolving workload patterns
Submit Feedback