This inefficiency occurs when production and non-production applications are hosted within the same App Service Plan. Production workloads often require higher availability, performance, or scaling characteristics, driving the plan toward larger or higher-cost SKUs. When non-production workloads share that plan, they inherit the higher cost structure even though their availability and performance requirements are typically much lower, resulting in unnecessary spend.
This inefficiency occurs when pod resource requests—often inflated by sidecar containers—push total memory or CPU just over a Fargate sizing boundary. Because Fargate adds mandatory system overhead and only supports fixed resource combinations, small incremental increases can force a pod into a much larger billing tier. This results in materially higher cost for marginal additional resource needs, especially in workloads that run continuously or at scale.
This inefficiency occurs when Provisioned Concurrency is enabled for Lambda functions that do not require consistently low latency or steady traffic. In such cases, reserved capacity remains allocated and billed during idle periods, creating ongoing cost without proportional performance or business benefit. This is distinct from standard Lambda execution charges, which are purely usage-based.
This inefficiency occurs when a protected resource (such as a virtual machine, database, or file share) is decommissioned without explicitly stopping backup protection. In these cases, Azure Backup continues to retain existing recovery points in the vault until the retention policy expires. Although the source resource no longer exists, backup storage remains allocated and billable, resulting in unnecessary ongoing costs.
This pattern is common when infrastructure is deleted outside of a formal decommissioning process or when backup ownership is unclear.
This inefficiency occurs when an Azure Savings Plan is scoped too narrowly relative to where eligible compute usage actually runs. When usage is spread across multiple subscriptions or fluctuates significantly (for example, development and test workloads that are frequently stopped and started), a narrowly scoped Savings Plan may not consistently find enough eligible usage to consume the full commitment. As a result, part of the committed hourly spend goes unused while other eligible workloads outside the scope continue to incur on-demand charges.
Azure supports broader scoping options—such as Management Group or Shared scope—that allow the commitment to be applied across a larger pool of eligible compute. Selecting an overly restrictive scope can therefore directly drive underutilization, even when sufficient total usage exists across the tenant.
Teams often start custom-model deployments with large architectures, full-precision weights, or older model versions carried over from training environments. When these models transition to Bedrock’s managed inference environment, the compute footprint (especially GPU class) becomes a major cost driver. Common inefficiencies include: * Deploying outdated custom models despite newer, more efficient variants being available, * Running full-size models for tasks that could be served by distilled or quantized versions, * Using accelerators overpowered for the workload’s latency requirements, or * Relying on default model artifacts instead of optimizing for inference. Because Bedrock Custom Models bill continuously for the backing compute, even small inefficiencies in model design or versioning translate into substantial ongoing cost.
Generative workloads that produce long outputs—such as detailed summaries, document rewrites, or multi-paragraph chat completions—require extended model runtime.
Embedding-based retrieval enables semantic matching even when keywords differ. But many Databricks workloads—catalog lookups, metadata search, deterministic classification, or fixed-rule routing—do not require semantic understanding. When embeddings are used anyway, teams incur DBU cost for embedding generation, additional storage for vector columns or indexes, and more expensive similarity-search compute. This often stems from defaulting to a RAG approach rather than evaluating whether a simpler retrieval mechanism would perform equally well.
Embeddings enable semantic retrieval by capturing the meaning of text, while keyword search returns results based on exact or lexical matches. Many Azure workloads—FAQ search, routing, deterministic classification, or structured lookups—achieve the same or better accuracy using simple keyword or metadata filtering. When embeddings are used for these uncomplicated tasks, organizations pay for token-based embedding generation, vector storage, and compute-heavy similarity search without receiving meaningful quality improvements. This inefficiency often occurs when RAG is used automatically rather than intentionally.
Embeddings enable semantic similarity search by representing text as high-dimensional vectors. Keyword search, however, returns results based on lexical matches and is often sufficient for simple retrieval tasks such as FAQ matching, deterministic filtering, metadata lookup, or rule-based routing. When embeddings are used for these low-complexity scenarios, organizations pay for compute to generate embeddings, storage for vector columns, and compute-heavy cosine similarity searches — without improving accuracy or user experience. In Snowflake, this can also increase warehouse load and query runtime.