Fixed Instance Count on Virtual Machine Scale Set Without Autoscaling

Aaran Bhambra

CER:

CER-0314

Service Category

Compute

Cloud Provider

Azure

Service Name

Azure Virtual Machine Scale Sets

Inefficiency Type

Inefficient Configuration

Explanation

Azure Virtual Machine Scale Sets can operate in two modes: manual scaling with a fixed instance count, or autoscaling with dynamic instance counts that respond to demand. When a scale set is configured with manual scaling, it maintains the same number of VM instances at all times — regardless of whether those instances are actively processing workload. Every provisioned instance continues to incur per-second compute charges, meaning the organization pays for full capacity even during off-peak hours, weekends, or seasonal lulls when only a fraction of that capacity is needed.

This pattern is especially wasteful for workloads with variable demand — web applications with daily traffic cycles, batch processing jobs that run at specific intervals, or services with clear seasonal peaks. If a scale set is sized for peak demand but runs at that capacity around the clock, the gap between provisioned resources and actual utilization translates directly into unnecessary spend. Microsoft explicitly identifies autoscaling as a mechanism to reduce scale set costs by running only the number of instances required to meet current demand.

There are legitimate reasons to maintain fixed capacity — stateful applications that cannot tolerate dynamic instance changes, workloads with licensing constraints tied to specific instance counts, or scenarios where consistent performance without scale-up latency is critical. However, many scale sets running at fixed capacity do so simply because autoscaling was never configured, not because it was deliberately excluded. Identifying and addressing these cases represents a significant cost optimization opportunity.

Relevant Billing Model

Azure Virtual Machine Scale Sets carry no incremental service charge — billing is based entirely on the underlying compute, storage, and networking resources consumed by the deployed VM instances:

Each VM instance in the scale set is billed per-second while running, with charges determined by the VM size (CPU, memory, and disk configuration)
Billing continues for all provisioned instances regardless of whether they are actively processing workload or sitting idle
When VMs are stopped and deallocated, compute charges cease, but attached storage resources such as managed disks continue to incur charges

With a fixed instance count, total compute cost equals the number of instances multiplied by the per-instance rate multiplied by hours running — a constant figure that does not flex with actual demand. Autoscaling reduces this by scaling in during low-demand periods, so organizations pay only for the capacity they need at any given time.

Detection

Identify Virtual Machine Scale Sets configured with manual scaling mode (fixed instance count) rather than autoscale policies
Review compute utilization across all instances in each scale set over a representative period to determine whether capacity consistently exceeds actual demand
Assess whether scale sets exhibit variable demand patterns — such as daily, weekly, or seasonal traffic cycles — that would benefit from dynamic scaling
Evaluate whether the workload running on the scale set is stateless and suitable for horizontal autoscaling, or stateful with constraints that require fixed capacity
Confirm whether scale sets with autoscale rules configured have minimum and maximum instance counts set to the same value, which effectively disables autoscaling
Examine historical utilization trends to identify scale sets where average resource consumption remains consistently low relative to provisioned capacity

Remediation

Enable autoscaling on scale sets running stateless or horizontally scalable workloads, defining appropriate minimum and maximum instance count boundaries based on observed demand patterns
Configure metric-based autoscale rules that scale out during periods of high demand and scale in when utilization drops, ensuring the scale set adjusts capacity dynamically
For workloads with predictable usage cycles (such as business hours versus nights and weekends), implement schedule-based autoscaling to proactively adjust instance counts ahead of known demand changes
Set appropriate cooldown periods between scaling actions to prevent rapid, unnecessary scale-in and scale-out oscillations that could affect application stability
For stateful applications that cannot support dynamic scaling, evaluate whether session state can be externalized to enable autoscaling, or consider alternative cost optimization strategies such as reserved instances for the baseline capacity
Establish periodic reviews of scale set configurations and utilization to ensure autoscale policies remain aligned with evolving workload patterns

Relevant Documentation

Submit Feedback