Workloads in private subnets often access AWS services like S3 or DynamoDB. If this traffic is routed through a NAT Gateway, it incurs both hourly and data processing charges. However, AWS offers VPC Gateway Endpoints (for S3/DynamoDB) and Interface Endpoints (for other services), which provide private access paths that bypass the NAT Gateway entirely. When teams fail to use VPC endpoints — often due to default routing configurations or lack of awareness — they unnecessarily route internal service calls through a costlier, public-facing path. This leads to persistent and avoidable spend.
ElastiCache clusters are often sized for peak performance or reliability assumptions that no longer reflect current workload needs. When memory and CPU usage remain consistently low, the node is likely overprovisioned. For Redis, memory is typically the primary sizing constraint, while Memcached workloads may be more CPU-sensitive. In dev, staging, or lightly used production environments, some nodes may be entirely idle.It's important to evaluate usage patterns in context — for example, replica nodes in Redis Multi-AZ configurations may show low utilization by design, but still serve a high-availability purpose. However, in non-critical environments or where HA is not required, those nodes can often be downsized or removed. Additionally, older ElastiCache instance types (e.g., r4, m3) are frequently less cost-efficient than newer generations like r6g or r7g, offering further savings through modernization.
In many environments, users launch Databricks clusters for development or analysis and forget to shut them down after use. When no auto-termination policy is configured, these clusters remain active indefinitely, incurring unnecessary charges for both Databricks and cloud infrastructure usage. This inefficiency is especially common in interactive clusters that are user-managed, ephemeral, or exploratory in nature. While Databricks provides built-in support for cluster auto-termination, teams often overlook it unless it's enforced through workspace policies. Without this safeguard in place, idle clusters can persist unnoticed for hours or days, leading to avoidable cost.
Elastic IPs are often provisioned but forgotten — left unassociated, or still attached to EC2 instances that have been stopped. In either case, AWS treats the EIP as idle and applies an hourly charge. Although the cost per hour is relatively small, these charges accumulate quietly, especially across environments with frequent provisioning, decommissioning, or ephemeral workloads. Many organizations overlook the fact that even a single EIP attached to a stopped instance is billable. Without periodic review, this creates persistent, low-visibility waste across AWS accounts.
EFS file systems that are no longer attached to any running services — such as EC2 instances or Lambda functions — continue to incur storage charges. This often occurs after workloads are decommissioned but the file system is left behind. A quick indicator of this state is when the EFS file system has no mount targets configured. Without active usage or connection, these orphaned file systems represent pure cost with no functional value. Unlike block storage, EFS does not require an attached instance to incur billing, making it easy for unused resources to go unnoticed.
Photon is optimized for SQL workloads, delivering significant speedups through vectorized execution and native C++ performance. However, Photon only accelerates workloads that use compatible operations and data patterns. If a workload includes unsupported functions, unoptimized joins, or falls back to interpreted execution, Photon may be silently bypassed — even on a Photon-enabled cluster. In this case, users are billed at a premium DBU rate while receiving no meaningful speed or efficiency gain. This inefficiency typically arises when teams enable Photon globally without validating workload compatibility or updating their pipelines to follow Photon best practices. The result is higher costs with no corresponding benefit — a classic case of configuration drift outpacing optimization discipline.
BigQuery incentivizes efficient data retention by cutting storage costs in half for tables or partitions that go 90 days without modification. However, many teams unintentionally forfeit this discount by performing broad or unnecessary updates to long-lived datasets — for example, touching an entire table when only a few rows need to change. Even small-scale or programmatic updates can trigger a full reset of the 90-day timer on affected data. This behavior is subtle but expensive: it silently doubles the storage cost of large datasets for another 90-day cycle, even when the data itself is mostly static. Without intentional safeguards, organizations may repeatedly reset their discounted storage window without realizing it.
In dynamic environments — especially during autoscaling, testing, or infrastructure changes — it's common for load balancers to remain provisioned after their backend resources have been decommissioned. When this happens, the load balancer continues to incur hourly charges despite serving no functional purpose. These inactive resources often go unnoticed, particularly in dev/test environments or when deployment pipelines fail to include proper cleanup logic. Over time, the accumulation of unused load balancers contributes to unnecessary recurring costs with no operational benefit.
While stopping an RDS instance reduces runtime cost, AWS enforces a 7-day limit on stopped state. After this period, the instance is automatically restarted and resumes incurring compute charges — even if the database is still not needed. This creates waste in cases where teams intended to pause the environment but failed to manage its lifecycle beyond the 7-day window. Without proper automation or teardown workflows, stopped RDS instances silently become active and billable again. The best practice for long-term inactivity is to snapshot the database and delete the instance entirely. If the instance must remain available for fast recovery, automation should be in place to re-stop it upon restart.
In Azure Databricks environments that rely on Private Link for secure networking, it’s common to route traffic through multi-tiered network architectures. This often includes multiple VNets, Private Link endpoints, or peered subscriptions between data sources (e.g., ADLS) and the Databricks compute plane. While these architectures may be designed for isolation or compliance, they frequently introduce redundant routing paths that add cost without improving performance. Each additional hop may result in duplicated Private Link ingress and egress charges. Without regular review, this can create persistent and unrecognized network inefficiencies tied to Databricks usage.