If auto-suspend settings are too high, warehouses can sit idle and continue accruing unnecessary charges. Tightening the auto-suspend window ensures that the warehouse shuts down quickly once queries complete, minimizing credit waste while maintaining acceptable user experience (e.g., caching needs, interactive performance).
If no appropriate query timeout is configured, inefficient or runaway queries can execute for extended periods (up to the default 2-day system limit). For as long as the query is running, the warehouse will remain active and accrue costs. Proper timeout settings help terminate inefficient queries, free up compute capacity, and allow the warehouse to become idle sooner, making it eligible for auto-suspend once the inactivity timer is reached.
Standard Load Balancers are frequently provisioned for internal services, internet-facing applications, or testing environments. When a workload is decommissioned or moved, the load balancer may be left behind without any active backend pool or traffic — but continues to incur hourly charges for each frontend IP configuration.Because Azure does not automatically remove or alert on inactive load balancers, and because they may not show significant outbound traffic, these resources often persist unnoticed. In dynamic or multi-team environments, this can result in a growing number of unused Standard Load Balancers generating silent, recurring costs.
Snapshots are often created for short-term protection before changes to a VM or disk, but many remain in the environment far beyond their intended lifespan. Over time, this leads to an accumulation of snapshots that are no longer associated with any active resource or retained for operational need.Since Azure does not enforce automatic expiration or lifecycle policies for snapshots, they can persist indefinitely and continue to incur monthly storage charges. This inefficiency is especially common in development environments, migration efforts, or manual backup workflows that lack centralized cleanup.Snapshots older than 30–90 days, especially those not tied to a documented backup strategy or workload, are strong candidates for review and removal.
Many organizations assign separate Snowflake warehouses to individual business units or teams to simplify chargebacks and operational ownership. This often results in redundant and underutilized warehouses, as workloads frequently do not require the full capacity of even the smallest warehouse size.
By consolidating compatible workloads onto shared warehouses, organizations can maximize utilization, reduce idle runtime across the fleet, and significantly lower total credit consumption. Cost allocation can still be achieved using Query Billing Attribution.
In Azure, it’s common for public IP addresses to be created as part of virtual machine or load balancer configurations. When those resources are deleted or reconfigured, the IP address may remain in the environment unassigned. While Basic SKUs are free when idle, Standard SKUs incur ongoing hourly charges, even if the address is not in use.Unassigned Standard public IPs provide no functional value but continue to generate cost, especially in environments with high churn or inconsistent cleanup practices.
Many Redis and Memcached clusters still use legacy x86-based node types (e.g., cache.r5, cache.m5) even though Graviton-based alternatives are available. In-memory workloads tend to be highly compatible with Graviton due to their simplicity and reliance on standard CPU and memory usage patterns.Unless constrained by architecture-specific extensions or strict compliance requirements, most ElastiCache clusters can be transitioned with no application-level changes. Failing to migrate to Graviton results in unnecessary compute spend and missed opportunities to improve cache efficiency.
Many RDS workloads continue to run on older x86 instance types (e.g., db.m5, db.r5) even though compatible Graviton-based options (e.g., db.m6g, db.r6g) are widely available. These newer families deliver improved performance per vCPU and lower hourly costs, yet are often not adopted due to legacy defaults, inertia, or lack of awareness.When workloads are not tightly bound to architecture-specific extensions (e.g., x86-specific binaries or drivers), switching to Graviton typically requires no application changes and results in immediate savings. Persisting with x86 in eligible use cases results in avoidable overpayment for compute.
When EC2 instances, Lambda functions, or containerized workloads access AWS-managed services without VPC Endpoints, that traffic exits the VPC through a NAT Gateway or Internet Gateway. This introduces unnecessary egress charges and NAT processing costs, especially for data-intensive or high-frequency workloads.
When ECS clusters are configured with an Auto Scaling Group that maintains a minimum number of EC2 instances (e.g., min = 1 or higher), the instances remain active even when there are no tasks scheduled. This leads to idle compute capacity and unnecessary EC2 charges.Instead, ECS Capacity Providers support target tracking scaling policies that can scale the ASG to zero when idle and automatically increase capacity when new tasks or services are scheduled. Failing to adopt this pattern results in persistent idle infrastructure and unnecessary costs in ECS environments that do not require always-on compute.
Replication instances are commonly left running after migration tasks are completed, especially when DMS is used for one-time or project-based migrations. Without active replication tasks, these instances no longer serve any purpose but continue to incur hourly compute costs. In large organizations or decentralized migration projects, these idle instances may go unnoticed, contributing to persistent background spend.
Azure Cosmos DB is optimized for low-latency, globally distributed workloads—not long-term storage of infrequently accessed data. Yet in many environments, cold data such as logs, telemetry, or historical records is retained in Cosmos DB due to a lack of lifecycle management.
AWS CloudTrail enables event logging across AWS services, but when multiple trails are configured to log overlapping events — especially data events — it can result in redundant charges and unnecessary storage or ingestion costs. This commonly occurs in decentralized environments where teams create trails independently, unaware of existing coverage or shared logging destinations.Each trail that records data events contributes to billing on a per-event basis, even if the same activity is logged by multiple trails. Additional costs may also arise from delivering duplicate logs to separate S3 buckets or CloudWatch Log groups. While separate trails may be justified for audit, compliance, or operational segmentation, unintentional duplication increases both cost and operational complexity without added value.
Engineers often enable verbose logging (e.g., debug or trace-level) during development or troubleshooting, then forget to disable it after deployment. This results in elevated log ingestion rates — and therefore costs — even when the detailed logs are no longer needed. Because CloudWatch Logs charges per GB ingested, persistent debug logging in production environments can create silent but material cost increases, particularly for high-throughput services.In environments with multiple teams or loosely governed log group policies, this issue can go undetected for long periods. Identifying and deactivating unnecessary debug-level logging is a low-risk, high-leverage optimization.
Development and test environments on Compute Engine are commonly provisioned and left running around the clock, even if only used during business hours. This results in wasteful spend on compute time that could be eliminated by scheduling shutdowns during idle periods. GCP enables scheduling via native tools such as Cloud Scheduler, Cloud Functions, or Terraform automation. Stopping VMs during off-hours preserves boot disks and instance metadata while halting compute billing.
Non-production EC2 instances are often provisioned for daytime-only usage but remain running 24/7 out of convenience or oversight. This results in unnecessary compute charges, even if the workload is inactive for 16+ hours per day. AWS supports automated schedules to stop and start instances at predefined times, allowing organizations to retain data and instance configuration without paying for unused runtime. Implementing a shutdown schedule for inactive periods (e.g., nights, weekends) can reduce compute costs by up to 60% in typical non-prod environments.
Non-production Azure VMs are frequently left running during off-hours despite being used only during business hours. When these instances remain active overnight or on weekends, they generate unnecessary compute spend. Azure offers built-in auto-shutdown features that allow teams to define daily stop times, retaining disk data and configurations without paying for VM runtime. Implementing scheduled shutdowns in dev/test environments is a simple, low-risk optimization that can reduce compute costs by 30–60%.
Many legacy workloads still run on older Elasticsearch versions — particularly 5.x, 6.x, or 7.x — due to inertia, compatibility constraints, or lack of ownership. Once these versions exceed their standard support window, AWS begins charging an hourly Extended Support fee for each domain. These fees are often missed in cost reviews, especially in environments that are inactive but still provisioned. In aggregate, outdated Elasticsearch clusters contribute to significant silent spend unless proactively addressed.
Domains running outdated OpenSearch versions — particularly OpenSearch 1.x — begin to incur AWS Extended Support charges once they fall outside of the standard support period. These charges are persistent and apply even if the domain is inactive or lightly used. Many teams overlook this cost when delaying upgrades or maintaining non-critical environments like dev, test, or staging. In large organizations, outdated versions can silently drive meaningful spend over time, especially across many small or idle domains.
Many workloads default to using Redis or Memcached without evaluating whether a lighter or more efficient engine would provide equivalent functionality at lower cost. Valkey is a Redis-compatible, open-source engine supported by ElastiCache that may offer improved price-performance and licensing benefits. For read-heavy or stateless workloads that don’t require Redis-specific features (e.g., persistence, advanced replication), Valkey can often be used as a drop-in replacement. Memcached, while simple, lacks key features like replication and persistence, and may be less cost-effective for certain access patterns. Choosing the wrong engine can result in overpaying for capabilities that aren’t needed — or missing opportunities to optimize.
ElastiCache clusters are often sized for peak performance or reliability assumptions that no longer reflect current workload needs. When memory and CPU usage remain consistently low, the node is likely overprovisioned. For Redis, memory is typically the primary sizing constraint, while Memcached workloads may be more CPU-sensitive. In dev, staging, or lightly used production environments, some nodes may be entirely idle.It's important to evaluate usage patterns in context — for example, replica nodes in Redis Multi-AZ configurations may show low utilization by design, but still serve a high-availability purpose. However, in non-critical environments or where HA is not required, those nodes can often be downsized or removed. Additionally, older ElastiCache instance types (e.g., r4, m3) are frequently less cost-efficient than newer generations like r6g or r7g, offering further savings through modernization.
In many environments, users launch Databricks clusters for development or analysis and forget to shut them down after use. When no auto-termination policy is configured, these clusters remain active indefinitely, incurring unnecessary charges for both Databricks and cloud infrastructure usage. This inefficiency is especially common in interactive clusters that are user-managed, ephemeral, or exploratory in nature. While Databricks provides built-in support for cluster auto-termination, teams often overlook it unless it's enforced through workspace policies. Without this safeguard in place, idle clusters can persist unnoticed for hours or days, leading to avoidable cost.
Workloads in private subnets often access AWS services like S3 or DynamoDB. If this traffic is routed through a NAT Gateway, it incurs both hourly and data processing charges. However, AWS offers VPC Gateway Endpoints (for S3/DynamoDB) and Interface Endpoints (for other services), which provide private access paths that bypass the NAT Gateway entirely. When teams fail to use VPC endpoints — often due to default routing configurations or lack of awareness — they unnecessarily route internal service calls through a costlier, public-facing path. This leads to persistent and avoidable spend.
Elastic IPs are often provisioned but forgotten — left unassociated, or still attached to EC2 instances that have been stopped. In either case, AWS treats the EIP as idle and applies an hourly charge. Although the cost per hour is relatively small, these charges accumulate quietly, especially across environments with frequent provisioning, decommissioning, or ephemeral workloads. Many organizations overlook the fact that even a single EIP attached to a stopped instance is billable. Without periodic review, this creates persistent, low-visibility waste across AWS accounts.
Photon is optimized for SQL workloads, delivering significant speedups through vectorized execution and native C++ performance. However, Photon only accelerates workloads that use compatible operations and data patterns. If a workload includes unsupported functions, unoptimized joins, or falls back to interpreted execution, Photon may be silently bypassed — even on a Photon-enabled cluster. In this case, users are billed at a premium DBU rate while receiving no meaningful speed or efficiency gain. This inefficiency typically arises when teams enable Photon globally without validating workload compatibility or updating their pipelines to follow Photon best practices. The result is higher costs with no corresponding benefit — a classic case of configuration drift outpacing optimization discipline.
BigQuery incentivizes efficient data retention by cutting storage costs in half for tables or partitions that go 90 days without modification. However, many teams unintentionally forfeit this discount by performing broad or unnecessary updates to long-lived datasets — for example, touching an entire table when only a few rows need to change. Even small-scale or programmatic updates can trigger a full reset of the 90-day timer on affected data. This behavior is subtle but expensive: it silently doubles the storage cost of large datasets for another 90-day cycle, even when the data itself is mostly static. Without intentional safeguards, organizations may repeatedly reset their discounted storage window without realizing it.
In Azure Databricks environments that rely on Private Link for secure networking, it’s common to route traffic through multi-tiered network architectures. This often includes multiple VNets, Private Link endpoints, or peered subscriptions between data sources (e.g., ADLS) and the Databricks compute plane. While these architectures may be designed for isolation or compliance, they frequently introduce redundant routing paths that add cost without improving performance. Each additional hop may result in duplicated Private Link ingress and egress charges. Without regular review, this can create persistent and unrecognized network inefficiencies tied to Databricks usage.
Databricks cost optimization begins with visibility. Unlike traditional IaaS services, Databricks operates as an orchestration layer spanning compute, storage, and execution — but its billing data often lacks granularity by workload, job, or team. This creates a visibility gap: costs fluctuate without clear root causes, ownership is unclear, and optimization efforts stall due to lack of actionable insight. When costs are not attributed functionally — for example, to orchestration (query/job DBUs), compute (cloud VMs), storage, or data transfer — it becomes difficult to pinpoint what’s driving spend or where improvements can be made. As a result, inefficiencies persist not due to a single misconfiguration, but because the system lacks the structure to surface them.
In dynamic environments — especially during autoscaling, testing, or infrastructure changes — it's common for load balancers to remain provisioned after their backend resources have been decommissioned. When this happens, the load balancer continues to incur hourly charges despite serving no functional purpose. These inactive resources often go unnoticed, particularly in dev/test environments or when deployment pipelines fail to include proper cleanup logic. Over time, the accumulation of unused load balancers contributes to unnecessary recurring costs with no operational benefit.
Each Lambda function must be configured with a memory setting, which indirectly controls the amount of CPU and networking performance allocated. In many environments, memory settings are defined arbitrarily or left unchanged as functions evolve. Over time, this leads to overprovisioning — with functions running well below their allocated memory and incurring unnecessary compute costs. Systematic right-sizing using performance benchmarks can significantly reduce spend without sacrificing performance or reliability. This is especially important for frequently invoked functions or those with long execution times.
While stopping an RDS instance reduces runtime cost, AWS enforces a 7-day limit on stopped state. After this period, the instance is automatically restarted and resumes incurring compute charges — even if the database is still not needed. This creates waste in cases where teams intended to pause the environment but failed to manage its lifecycle beyond the 7-day window. Without proper automation or teardown workflows, stopped RDS instances silently become active and billable again. The best practice for long-term inactivity is to snapshot the database and delete the instance entirely. If the instance must remain available for fast recovery, automation should be in place to re-stop it upon restart.
While many AWS customers have migrated EC2 workloads to Graviton to reduce costs, Lambda functions often remain on the default x86 architecture. AWS Graviton2 (ARM) offers lower pricing and equal or better performance for most supported runtimes — yet adoption remains uneven due to legacy defaults or lack of awareness. Continuing to run eligible Lambda functions on x86 leads to unnecessary spending. The migration requires minimal configuration changes and can be verified through benchmarking and workload testing.
EFS file systems that are no longer attached to any running services — such as EC2 instances or Lambda functions — continue to incur storage charges. This often occurs after workloads are decommissioned but the file system is left behind. A quick indicator of this state is when the EFS file system has no mount targets configured. Without active usage or connection, these orphaned file systems represent pure cost with no functional value. Unlike block storage, EFS does not require an attached instance to incur billing, making it easy for unused resources to go unnoticed.
For Premium SSD and Standard SSD disks 513 GiB or larger, Azure now offers the option to enable Performance Plus — unlocking higher IOPS and MBps at no extra cost. Many environments that previously required custom performance settings continue to pay for additional throughput unnecessarily. By not enabling Performance Plus on eligible disks, organizations miss a straightforward opportunity to reduce disk spend while maintaining or improving performance. The feature is opt-in and must be explicitly enabled on each qualifying disk.
Each Azure VM size has a defined limit for total disk IOPS and throughput. When high-performance disks (e.g., Premium SSDs with high IOPS capacity) are attached to low-tier VMs, the disk’s performance capabilities may exceed what the VM can consume. This results in paying for performance that the VM cannot access. For example, attaching a large Premium SSD to a B-series VM will not provide the expected performance because the VM cannot deliver that level of throughput. Without aligning disk selection with VM limits, organizations incur unnecessary storage costs with no corresponding performance benefit.
Azure WAF configurations attached to Application Gateways can persist after their backend pool resources have been removed — often during environment reconfiguration or application decommissioning. In these cases, the WAF is no longer serving any functional purpose but continues to incur fixed hourly costs. Because no traffic is routed and no applications are protected, the WAF is effectively inactive. These orphaned WAFs are easy to overlook without regular cleanup processes and can quietly accumulate unnecessary charges over time.
Many EC2 workloads—such as development environments, test jobs, stateless services, and data processing pipelines—can tolerate interruptions and do not require the reliability of On-Demand pricing. Using On-Demand instances in these scenarios drives up cost without adding value. Spot Instances offer significantly lower pricing and are well-suited to workloads that can handle restarts, retries, or fluctuations in capacity. Without evaluating workload tolerance and adjusting pricing models accordingly, organizations risk consistently overpaying for compute.
Databricks supports AWS Graviton-based instances for most workloads, including Spark jobs, data engineering pipelines, and interactive notebooks. These instances offer significant cost advantages over traditional x86-based VMs, with comparable or better performance in many cases. When teams default to legacy instance types, they miss an easy opportunity to reduce compute spend. Unless workloads have known compatibility issues or specialized requirements, Graviton should be the default instance family used in Databricks Clusters.
In Databricks, on-demand instances provide reliable performance but come at a premium cost. For non-production workloads—such as development, testing, or exploratory analysis—high availability is often unnecessary. Spot instances provide equivalent performance at a lower price, with the tradeoff of occasional interruptions. If teams default to on-demand usage in lower environments, they may be incurring unnecessary compute costs. Using compute policies to limit on-demand usage ensures greater consistency and efficiency across environments.
Databricks users can select from a wide range of instance types for cluster driver and worker nodes. Without guardrails, teams may choose high-cost configurations (e.g., 16xlarge nodes) that exceed workload requirements. This results in inflated costs with little performance benefit. To reduce this risk, administrators can use compute policies to define acceptable node types and enforce size limits across the workspace.
Workloads using legacy Premium SSD managed disks may be eligible for migration to Premium SSD v2, which delivers equivalent or improved performance characteristics at a lower cost. Premium SSD v2 decouples disk size from performance metrics like IOPS and throughput, enabling more granular cost optimization. Additionally, Premium SSD disks are often overprovisioned in size—for example, a P40 disk with more IOPS and capacity than the workload requires—resulting in inflated storage costs. Rightsizing includes both transitioning to v2 and resizing to smaller SKUs (e.g., P40 → P20) based on observed utilization. Failure to address either form of overprovisioning leads to persistent waste.
Interactive clusters are often left running between periods of active use. To mitigate idle charges, Databricks provides an “autotermination” setting that shuts down clusters after a period of inactivity. However, if the termination period is set too high, or if policies do not enforce reasonable thresholds, idle clusters can persist for long durations without performing any work—resulting in wasted compute spend. Lowering the termination window reduces exposure to idle time while preserving user flexibility.
Interactive clusters are intended for development and ad-hoc analysis, remaining active until manually terminated. When used to run scheduled jobs or production workflows, they often stay idle between executions—leading to unnecessary infrastructure and DBU costs. Job clusters are designed for ephemeral, single-job execution and automatically terminate upon completion, reducing runtime and isolating workloads. Using interactive clusters for production jobs leads to cost inefficiencies and weaker workload boundaries.
Standard SSD disks can often be replaced with Premium SSD v2 disks, offering enhanced IOPS, throughput, and durability at competitive or lower pricing. For workloads that require moderate to high performance but are currently constrained by Standard SSD capabilities, migrating to Premium SSD v2 improves both performance and cost efficiency without significant operational overhead.
Disks attached to VMs that have been stopped for an extended period, particularly when showing no read or write activity, may indicate abandoned infrastructure or obsolete resources. Retaining these disks without validation leads to unnecessary monthly storage costs. Reviewing and cleaning up inactive disks helps optimize spend and maintain storage hygiene.
Tables with no read or write activity often represent deprecated applications, obsolete telemetry, or abandoned development artifacts. Retaining inactive tables increases storage costs and operational complexity. Regularly auditing and cleaning up unused tables helps maintain a streamlined, cost-effective storage environment.
Files that show no read or write activity over an extended period often indicate redundant or abandoned data. Keeping inactive files in higher-cost storage classes unnecessarily increases monthly spend. Implementing proactive archiving, deletion workflows, and safety features like Blob Soft Delete or Versioning improves cost efficiency while protecting against accidental data loss.
Non-production environments such as development, testing, or staging often do not require the high availability, failover capabilities, and premium storage performance offered by the Business Critical tier. Running these workloads on Business Critical unnecessarily inflates costs. Choosing a lower-cost tier like General Purpose typically provides sufficient performance and availability for non-production use cases, significantly reducing ongoing database expenses.
Workloads with consistently low CPU and memory usage may no longer serve active traffic or scheduled tasks, but continue reserving resources within the cluster. These idle deployments often remain after project migrations, feature deprecations, or experimentation. Removing inactive workloads allows node groups to scale down, reducing infrastructure costs without impacting active services.
Buckets without Autoclass enabled can accumulate infrequently accessed data in more expensive storage classes, inflating monthly costs. Enabling Autoclass allows GCS to automatically move objects to lower-cost tiers based on observed access behavior, optimizing storage costs without manual lifecycle policy management. Activating Autoclass reduces operational overhead while maintaining seamless access to objects across storage classes.
Clusters that no longer run active workloads but remain provisioned continue incurring hourly control plane costs and may also maintain associated infrastructure like node groups or VPC components. Inactive clusters often persist after environment decommissioning, project shutdowns, or migrations. Decommissioning unused clusters eliminates unnecessary operational costs and simplifies infrastructure management.
When an EC2 instance is dedicated primarily to internet-facing traffic, regional differences in data transfer pricing can drive a substantial portion of total costs. Hosting such workloads in a region with higher egress rates leads to elevated expenses without improving performance. Migrating the workload to a lower-cost region can yield significant savings while maintaining equivalent service quality, especially when no strict latency or compliance requirements dictate the original location.
Running non-production clusters solely on On-Demand Instances results in unnecessarily high compute costs. Development, testing, and QA environments typically tolerate interruptions and do not require the continuous availability guaranteed by On-Demand capacity. Introducing Spot-backed node groups in non-production environments can significantly reduce infrastructure expenses without compromising business requirements.
Classic Load Balancers that no longer serve active workloads will persist if they are not properly decommissioned. This often happens after application migrations, architecture changes, or testing activities. Even if no connections or traffic are passing through the CLB, it continues to incur baseline charges until manually deleted. Identifying and removing unused load balancers helps eliminate waste without impacting operations.
Network Load Balancers that are no longer needed often persist after architecture changes, service decommissioning, or migration projects. When no active TCP connections or traffic flow through the NLB, it still generates hourly operational costs. Identifying and removing these idle resources helps reduce unnecessary networking expenses without affecting service availability.
Application Load Balancers that no longer serve active workloads may persist after application migrations, architecture changes, or testing activities. When no incoming requests are processed through the ALB, it continues to generate baseline hourly and LCU charges. Identifying and decommissioning unused ALBs helps reduce networking expenses without impacting operational environments.
Gateway Load Balancers that no longer have active traffic flows can continue to exist indefinitely unless proactively decommissioned. This often happens after network topology changes, security architecture updates, or environment deprecations. Without active packet forwarding, the GLB provides no functional benefit but still incurs hourly and data transfer costs.
Oversized instances within Auto Scaling Groups lead to inflated baseline costs, even when scaling adjusts the number of instances dynamically. When workloads consistently use only a fraction of the available CPU, memory, or network capacity, there is an opportunity to downsize to smaller, less expensive instance types without sacrificing performance. Right-sizing helps balance capacity and efficiency, reducing compute spend while preserving workload stability.
Detection:
This inefficiency arises when snapshots are retained long after they’ve served their purpose. Snapshots may have been created for backups, migrations, or disaster recovery plans but were never deleted—even after the related workload or volume was decommissioned. Over time, these unused snapshots accumulate, continuing to incur storage costs without providing operational value.
This inefficiency occurs when small files are stored in S3 storage classes that impose a minimum object size charge, resulting in unnecessary costs. Small files under 128 KB stored in Glacier Instant Retrieval, Standard-IA, or One Zone-IA are billed as if they were 128 KB. If these small files are accessed frequently, S3 Standard may be a better fit. For infrequently accessed small files, transitioning them to archival storage classes like Glacier Flexible Retrieval or Deep Archive can optimize storage spend. Poorly tuned lifecycle policies often allow small files to remain in suboptimal storage classes indefinitely.
This inefficiency occurs when a table remains in the default Standard storage class despite having minimal or infrequent access. In these cases, switching to Standard-IA can significantly reduce monthly storage costs, especially for archival tables, compliance data, or legacy systems that are still retained but rarely queried.
This inefficiency occurs when an RDS instance uses a high-cost storage type such as io1 or io2 but does not require the performance benefits it provides. In many cases, provisioned IOPS are set at or below the free baseline included with gp3 (3,000 IOPS and 125 MB/s). In such scenarios, continuing to use provisioned IOPS storage results in elevated cost with no functional advantage. These misconfigurations often persist due to legacy templates, default settings, or a lack of periodic review.
This inefficiency occurs when a DynamoDB table is no longer accessed by any active workload but continues to accumulate storage charges. These tables often remain after a project ends, a feature is retired, or data is migrated elsewhere. Without any read or write activity, the table provides no functional value and becomes a cost liability.
This inefficiency occurs when legacy volume types such as gp2 or io1 remain in use, even though AWS has released newer types—like gp3 and io2—that offer better performance at lower cost. Gp3 allows users to configure IOPS and throughput independently of volume size, while io2 provides higher durability and more predictable performance than io1. These newer volumes are generally more cost-effective and can be adopted without re-architecting workloads. Many teams continue using outdated types due to default AMIs, automation templates, or simple oversight.
This inefficiency occurs when an RDS instance remains in the running state but is no longer actively serving application traffic. These instances may be remnants of retired applications, paused development environments, or workloads that were migrated elsewhere. If an instance shows no active connections and sustained inactivity across CPU and memory metrics, it is likely idle and generating unnecessary costs.
This inefficiency occurs when an RDS instance is consistently operating below its provisioned capacity—for example, showing low CPU, or memory utilization over an extended period. This often results from conservative initial sizing, decreased workload demand, or failure to review and adjust after deployment. Running oversized RDS instances leads to unnecessary compute and licensing costs without delivering additional value.
This inefficiency occurs when an RDS instance is allocated significantly more storage than it consumes. For example, a 2TB volume might contain only 150GB of actual data. Since RDS does not allow reducing allocated storage on existing instances, these volumes continue to incur charges based on total provisioned size—not actual usage. This often goes unnoticed in long-running databases that no longer require their original allocation.
When an RDS cluster is not upgraded in time, it can fall out of standard support and incur Extended Support charges. This often happens when upgrade cycles are delayed, blocked by compatibility issues, or deprioritized due to competing initiatives. Over time, these fees can add up significantly. Staying on an outdated version also increases operational risk and reduces access to engine improvements, performance enhancements, and security patches.
This inefficiency occurs when compression is either disabled or not functioning effectively on a CloudFront distribution. Static assets such as text, JSON, JavaScript, and CSS files are compressible and benefit significantly from compression. Without compression, CloudFront transfers larger objects, leading to increased data transfer charges and slower delivery performance—without improving user experience.
This inefficiency occurs when an EBS volume has provisioned IOPS levels that consistently exceed the actual I/O requirements of the workload it supports. This can happen when performance buffers are estimated too high, usage patterns change over time, or default settings are left unadjusted. Provisioned IOPS above the included baseline generate ongoing charges that may not reflect actual utilization, resulting in avoidable cost.
Multipart upload allows large files to be uploaded in segments. Each part is stored individually until the upload is finalized by a “CompleteMultipartUpload” request. If this final request is never issued—due to a timeout, crash, failed job, or misconfiguration—the parts remain stored but are effectively useless: they do not form a valid object and cannot be retrieved. Without a lifecycle policy in place to clean up these incomplete uploads, the orphaned parts persist and continue to incur storage charges indefinitely.
This inefficiency occurs when an EC2 instance remains in a running state but is not actively utilized. These instances may be remnants of past projects, forgotten development environments, or temporarily created for testing and never decommissioned. If an instance shows consistently low or no CPU, network, or disk activity—and no active connections—it likely serves no operational purpose but continues to generate ongoing compute and storage charges.
This inefficiency occurs when a VM is deallocated but its attached managed disks are still active and incurring storage charges. While compute billing stops for deallocated VMs, the disks remain provisioned and billable. These disks often persist unintentionally after a VM is paused, retired, or left unused in dev/test environments, resulting in waste if not explicitly cleaned up.
ListBucket requests are commonly used to enumerate objects in a bucket, such as by backup systems, scheduled sync jobs, data catalogs, or monitoring tools. When these operations are frequent or target buckets with large object counts, they can generate disproportionately high request charges. In many cases, real-time enumeration is not necessary and can be replaced with more efficient alternatives like S3 Inventory, which provides object metadata on a scheduled basis at lower cost.
This inefficiency occurs when a blob container intended for long-term or infrequently accessed data continues to store objects in higher-cost tiers like Hot or Cool, instead of using the Archive tier. This often happens when containers are created without lifecycle policies or default tier settings. Over time, storing archival data in non-archival tiers results in avoidable cost without any performance benefit, especially for compliance data, backups, or historical logs that rarely need to be accessed.
This inefficiency arises when a virtual machine is left in a stopped (deallocated) state for an extended period but continues to incur costs through attached storage and associated resources. These idle VMs are often remnants of retired workloads, temporary environments, or paused projects that were never fully cleaned up. Without clear ownership or automated cleanup, they can persist unnoticed and accumulate avoidable charges.
When an EKS cluster remains on a Kubernetes version that has reached the end of standard support, AWS begins charging an additional Extended Support fee. These charges often arise from delays in upgrade cycles, uncertainty about workload compatibility, or overlooked legacy clusters. If the workload does not require the older version, continuing to run the cluster in this state results in unnecessary cost and technical risk.
This inefficiency occurs when an EC2 instance is stopped but still has one or more attached EBS volumes. Although the compute resource is not generating charges while stopped, the attached volumes continue to incur full storage and performance-related costs. These volumes are often overlooked in cost reviews, especially if the instance is temporarily paused or has been left in a stopped state long-term. Without regular validation, these volumes may represent unused capacity that delivers no value.
This inefficiency occurs when an RDS cluster remains provisioned but is no longer serving any workloads and has no active database connections. Unlike underutilized resources, these clusters are completely idle—showing no query activity, background processing, or usage over time. They often persist in dev, staging, or legacy environments where the workload has been retired or moved, yet the cluster remains active and continues to generate ongoing compute and storage costs.
When the EC2 instance types used for EKS node groups have a memory-to-CPU ratio that doesn’t match the workload profile, the result is poor bin-packing efficiency. For example, if memory-intensive containers are scheduled on compute-optimized nodes, memory may run out first while CPU remains unused. This forces new nodes to be provisioned earlier than necessary. Over time, this mismatch can lead to higher compute costs even if the cluster appears fully utilized.
S3 buckets often persist after projects complete or when the associated workloads have been retired. If a bucket is no longer being read from or written to—and its contents are not required for compliance, backup, or retention purposes—it represents ongoing cost without delivering value. Many organizations overlook these idle buckets, especially in shared or legacy accounts, leading to unnecessary storage costs over time.
Some ElastiCache clusters continue to run on older-generation node types that have since been replaced by newer, more cost-effective options. This can happen due to legacy templates, lack of version validation, or infrastructure that has not been reviewed in years. Newer instance families often deliver better performance at a lower hourly rate. Modernizing to newer node types can reduce compute spend without sacrificing performance, and in many cases, improve it.
Workloads are sometimes deployed in specific AWS regions based on legacy decisions, developer convenience, or perceived performance requirements. However, regional EC2 pricing can vary significantly, and placing instances in a suboptimal region can lead to higher compute costs, increased data transfer charges, or both. In particular, workloads that frequently communicate with resources in other regions—or that serve a user base concentrated elsewhere—can incur unnecessary costs. Re-evaluating regional placement can reduce these costs without compromising performance or availability when done strategically.
Some architectures unintentionally route large volumes of traffic between resources that reside in different Availability Zones—such as database queries, service calls, replication, or logging. While these patterns may be functionally correct, they can lead to unnecessary data transfer charges when the traffic could be contained within a single AZ. Over time, this can become a silent cost driver, especially for chatty microservices, replicated storage layers, or high-throughput pipelines. Re-architecting for AZ-locality—when possible—can reduce these charges without affecting availability in environments where high resilience isn’t required.
Some S3 lifecycle policies are configured to transition objects from Standard storage to Intelligent-Tiering after a fixed number of days (e.g., 30 days). This creates a delay where objects reside in S3 Standard, incurring higher storage costs without benefit. Since Intelligent-Tiering does not require prior access history and can be used immediately, it is often more efficient to place objects directly into Intelligent-Tiering at the time of upload. Lifecycle transitions introduce unnecessary intermediate costs that can be avoided entirely through configuration changes.
VPC Interface Endpoints are commonly deployed to meet network security or compliance requirements by enabling private access to AWS services. However, these endpoints often remain provisioned even after the original use case is deprecated. In some cases, the applications have been decommissioned; in others, traffic routing has changed and the endpoint is no longer used. Since interface endpoints generate hourly charges whether or not they are used, identifying and removing inactive ones can eliminate unnecessary costs.
Manual snapshots are often created for operational tasks like upgrades, migrations, or point-in-time backups. Unlike automated backups, which are automatically deleted after a set retention period, manual snapshots remain in place until explicitly deleted. Over time, this can lead to an accumulation of snapshots that are no longer needed but still incur monthly storage charges. This is particularly common in environments where snapshots are taken frequently but not consistently reviewed. If left unmanaged, manual snapshots can become a source of ongoing cost, especially for large databases or when snapshots are copied across regions.
GCP VM instances are often provisioned with more CPU or memory than needed, especially when using custom machine types or legacy templates. If an instance consistently consumes only a small portion of its allocated resources, it likely represents an opportunity to reduce costs through rightsizing. Without proactive reviews, these oversized instances can remain unnoticed and continue to incur unnecessary charges.
NAT Gateways are convenient for enabling outbound access from private subnets, but in data-intensive environments, they can quietly become a major cost driver. When large volumes of traffic flow through the gateway—particularly during batch processing, frequent software updates, or hybrid cloud integrations—the per-GB charges accumulate rapidly. In some cases, replacing a managed NAT Gateway with a self-managed NAT instance can substantially reduce costs, provided that the organization is prepared to operate and maintain the alternative solution.
Storage accounts can accumulate blob data that is no longer actively accessed—such as legacy logs, expired backups, outdated exports, or orphaned files. When these blobs remain in the Hot tier, they continue to incur the highest storage cost, even if they have not been read or modified for an extended period. Without lifecycle management in place, these inactive blobs often go unnoticed and accumulate cost. In many cases, the data could be safely transitioned to a lower-cost tier or deleted altogether, depending on retention needs. Additionally, misconfigured default tier settings at the account or container level can cause even new uploads to be stored in the Hot tier unnecessarily. Azure lifecycle transitions do not incur additional fees, making automation a low-risk optimization method.
NAT Gateways are frequently left running after environments are re-architected, workloads are shut down, or connectivity patterns change. In many cases, they continue to incur hourly charges despite no active traffic flowing through them. Because hourly fees are not tied to whether the gateway is needed—just whether it exists—these resources can quietly drive recurring costs without delivering ongoing value. Identifying and removing unused gateways is a simple way to reduce waste.
Azure VMs are frequently provisioned with more vCPU and memory than needed, often based on template defaults or peak demand assumptions. When a VM operates well below its capacity for an extended period, it presents an opportunity to reduce costs through rightsizing. Without regular usage reviews, these inefficiencies can persist indefinitely.
EC2 instances are often overprovisioned based on rough estimates, legacy patterns, or performance buffer assumptions. If an instance consistently uses only a small fraction of its provisioned CPU or memory, it likely represents an opportunity for rightsizing. These inefficiencies persist unless usage is periodically reviewed and instance types are adjusted to align with actual workload requirements.
GCS buckets often persist after applications are retired or data is no longer in active use. Without access activity, these buckets generate storage charges without providing ongoing value. Leaving stale data in Standard storage—designed for frequent access—results in unnecessary cost. If the data must be retained for compliance or future reference, colder tiers offer substantial savings. If it is no longer needed, the data should be deleted.
Managed Disks frequently remain detached after Azure virtual machines are deleted, reimaged, or reconfigured. Some may be intentionally retained for reattachment, backup, or migration purposes, but many persist unintentionally due to the lack of automated cleanup processes. When these detached disks are also inactive—showing no read or write activity—they represent unnecessary ongoing costs. Identifying and removing these orphaned disks can produce meaningful savings without affecting any active workloads.
EBS volumes frequently remain detached after EC2 instances are terminated, replaced, or reconfigured. Some may be intentionally retained for reattachment or backup purposes, but many persist unintentionally due to the lack of automated cleanup. When these detached volumes are also inactive—showing no read or write activity—they represent unnecessary ongoing costs. Identifying and removing these orphaned volumes can produce meaningful savings without affecting running workloads.
Read replicas are intended to improve performance for read-heavy workloads or support cross-region redundancy. However, it's common for replicas to remain in place even after their intended purpose has passed. In some cases, they were provisioned for scaling or analytics workloads that no longer exist; in others, they are tied to live environments but not actively receiving queries. Since each replica incurs full RDS costs, retaining one that is no longer used leads to unnecessary ongoing expenses.
S3 Standard is the default storage class and is often used by default even for data that is rarely accessed. Keeping large volumes of infrequently accessed data in S3 Standard leads to unnecessary costs. Data such as backups, logs, archives, or historical snapshots are often strong candidates for migration to colder tiers like S3 Glacier or Deep Archive. If access patterns are unknown or variable, S3 Intelligent-Tiering can reduce costs without requiring manual transitions.
CloudWatch log groups often persist long after their usefulness has expired. In some cases, they are associated with applications or resources that are no longer active. In other cases, the systems may still be running, but the log data is no longer being reviewed, analyzed, or used by any team. Regardless of the reason, retaining logs that no one is monitoring or using results in unnecessary storage costs. If log data is not needed for operational visibility, debugging, compliance, or auditing purposes, it should either be deleted or managed with a shorter retention policy.
Many Aurora clusters default to using the Standard configuration, which charges separately for I/O operations. For workloads with frequent read and write activity, this can lead to unnecessarily high costs. Aurora I/O-Optimized eliminates I/O charges entirely and simplifies cost predictability. In environments with consistently high I/O usage, switching to I/O-Optimized often results in lower total spend.