When fleet auto scaling policies maintain more active instances than are required to support current usage—particularly during off-peak hours—organizations incur unnecessary compute costs. Fleets often remain oversized due to conservative default configurations or lack of schedule-based scaling. Tuning the scaling policies to better reflect usage patterns ensures that streaming infrastructure aligns with actual demand.
AppStream fleets often default to instance types designed for worst-case or peak usage scenarios, even when average workloads are significantly lighter. This leads to consistently low utilization of CPU, memory, or GPU resources and inflated infrastructure costs. By right-sizing AppStream instances based on actual workload needs, organizations can reduce spend without compromising user experience.
When AppStream builder instances are left running but unused, they continue to generate compute charges without delivering any value. These instances are commonly left active after configuration or image creation is completed but can be safely stopped or terminated when not in use. Identifying and decommissioning inactive builders helps reduce unnecessary compute costs.
When reservations are scoped only to a single subscription, any unused capacity cannot be applied to matching resources in other subscriptions within the same tenant. This leads to underutilization of the committed reservation and continued on-demand charges in other parts of the organization. Enabling **Shared scope** allows all eligible subscriptions to consume the reservation benefit, improving utilization and reducing overall spend. This is particularly impactful in environments with decentralized provisioning, such as across dev/test/prod subscriptions or multiple business units.
When Kubernetes workloads request more CPU and memory than they actually consume, nodes must reserve capacity that remains unused. This leads to lower node density, forcing the cluster to maintain more instances than necessary. Aligning resource requests with observed utilization improves cluster efficiency and reduces compute spend without sacrificing application performance.
Databricks Serverless Compute is now available for jobs and notebooks, offering a simplified, autoscaled compute environment that eliminates cluster provisioning, reduces idle overhead, and improves Spot survivability. For short-running, bursty, or interactive workloads, Serverless can significantly reduce cost by billing only for execution time. However, Serverless is not universally available or compatible with all workload types and libraries. Organizations that exclusively rely on traditional clusters may be missing emerging opportunities to reduce spend and simplify operations by leveraging Serverless where appropriate.
When EC2 usage declines, shifts to different instance families, or moves to other services (e.g., containers or serverless), organizations may find that previously purchased Standard Reserved Instances or Savings Plans no longer match current workload patterns.
This misalignment results in underutilized commitments—where costs are still incurred, but no usage is benefiting from the associated discounts. Since these commitments cannot be easily exchanged, refunded, or sold (except for eligible RIs on the RI Marketplace), the only viable path to recoup value is to steer workloads back toward the covered usage profile.
As workloads evolve, Azure Reserved Instances (RIs) may no longer align with actual usage — due to refactoring, region changes, autoscaling, or instance-type drift. When this happens, the committed usage goes unused, while new workloads run on non-covered SKUs, resulting in both underutilized reservations and full-price on-demand charges elsewhere.
The root inefficiency is architectural or operational drift away from what was originally committed — often due to team autonomy, poor RI governance, or legacy commitments. This leads to silent waste unless workloads are re-aligned to match existing reservations.
Applications running on App Service V2 plans may incur higher operational costs and degraded performance compared to V3 plans. V2 uses older hardware generations that lack access to platform-level enhancements introduced in V3, including improved cold start times, faster scaling, and enhanced networking options.
This inefficiency often arises from legacy deployments or default provisioning choices that haven't been revisited. Without proactive review, teams may continue running production workloads on suboptimal infrastructure—paying more for less performance.
App Service Plans continue to incur charges even when no applications are deployed. This can occur when applications are deleted, migrated, or retired, but the associated App Service Plan remains active. Without ongoing workloads, these idle plans become silent cost contributors — especially in higher-cost SKUs like Premium v3 or Isolated v2.
In large or decentralized environments, unused plans can accumulate quickly if cleanup is not automated or routinely enforced. These idle plans offer no functional value but continue to consume compute resources and generate operational expense.