ElastiCache clusters are often sized for peak performance or reliability assumptions that no longer reflect current workload needs. When memory and CPU usage remain consistently low, the node is likely overprovisioned. For Redis, memory is typically the primary sizing constraint, while Memcached workloads may be more CPU-sensitive. In dev, staging, or lightly used production environments, some nodes may be entirely idle.It's important to evaluate usage patterns in context — for example, replica nodes in Redis Multi-AZ configurations may show low utilization by design, but still serve a high-availability purpose. However, in non-critical environments or where HA is not required, those nodes can often be downsized or removed. Additionally, older ElastiCache instance types (e.g., r4, m3) are frequently less cost-efficient than newer generations like r6g or r7g, offering further savings through modernization.
In many environments, users launch Databricks clusters for development or analysis and forget to shut them down after use. When no auto-termination policy is configured, these clusters remain active indefinitely, incurring unnecessary charges for both Databricks and cloud infrastructure usage. This inefficiency is especially common in interactive clusters that are user-managed, ephemeral, or exploratory in nature. While Databricks provides built-in support for cluster auto-termination, teams often overlook it unless it's enforced through workspace policies. Without this safeguard in place, idle clusters can persist unnoticed for hours or days, leading to avoidable cost.
Workloads in private subnets often access AWS services like S3 or DynamoDB. If this traffic is routed through a NAT Gateway, it incurs both hourly and data processing charges. However, AWS offers VPC Gateway Endpoints (for S3/DynamoDB) and Interface Endpoints (for other services), which provide private access paths that bypass the NAT Gateway entirely. When teams fail to use VPC endpoints — often due to default routing configurations or lack of awareness — they unnecessarily route internal service calls through a costlier, public-facing path. This leads to persistent and avoidable spend.
Elastic IPs are often provisioned but forgotten — left unassociated, or still attached to EC2 instances that have been stopped. In either case, AWS treats the EIP as idle and applies an hourly charge. Although the cost per hour is relatively small, these charges accumulate quietly, especially across environments with frequent provisioning, decommissioning, or ephemeral workloads. Many organizations overlook the fact that even a single EIP attached to a stopped instance is billable. Without periodic review, this creates persistent, low-visibility waste across AWS accounts.
Photon is optimized for SQL workloads, delivering significant speedups through vectorized execution and native C++ performance. However, Photon only accelerates workloads that use compatible operations and data patterns. If a workload includes unsupported functions, unoptimized joins, or falls back to interpreted execution, Photon may be silently bypassed — even on a Photon-enabled cluster. In this case, users are billed at a premium DBU rate while receiving no meaningful speed or efficiency gain. This inefficiency typically arises when teams enable Photon globally without validating workload compatibility or updating their pipelines to follow Photon best practices. The result is higher costs with no corresponding benefit — a classic case of configuration drift outpacing optimization discipline.
BigQuery incentivizes efficient data retention by cutting storage costs in half for tables or partitions that go 90 days without modification. However, many teams unintentionally forfeit this discount by performing broad or unnecessary updates to long-lived datasets — for example, touching an entire table when only a few rows need to change. Even small-scale or programmatic updates can trigger a full reset of the 90-day timer on affected data. This behavior is subtle but expensive: it silently doubles the storage cost of large datasets for another 90-day cycle, even when the data itself is mostly static. Without intentional safeguards, organizations may repeatedly reset their discounted storage window without realizing it.
In Azure Databricks environments that rely on Private Link for secure networking, it’s common to route traffic through multi-tiered network architectures. This often includes multiple VNets, Private Link endpoints, or peered subscriptions between data sources (e.g., ADLS) and the Databricks compute plane. While these architectures may be designed for isolation or compliance, they frequently introduce redundant routing paths that add cost without improving performance. Each additional hop may result in duplicated Private Link ingress and egress charges. Without regular review, this can create persistent and unrecognized network inefficiencies tied to Databricks usage.
Databricks cost optimization begins with visibility. Unlike traditional IaaS services, Databricks operates as an orchestration layer spanning compute, storage, and execution — but its billing data often lacks granularity by workload, job, or team. This creates a visibility gap: costs fluctuate without clear root causes, ownership is unclear, and optimization efforts stall due to lack of actionable insight. When costs are not attributed functionally — for example, to orchestration (query/job DBUs), compute (cloud VMs), storage, or data transfer — it becomes difficult to pinpoint what’s driving spend or where improvements can be made. As a result, inefficiencies persist not due to a single misconfiguration, but because the system lacks the structure to surface them.
In dynamic environments — especially during autoscaling, testing, or infrastructure changes — it's common for load balancers to remain provisioned after their backend resources have been decommissioned. When this happens, the load balancer continues to incur hourly charges despite serving no functional purpose. These inactive resources often go unnoticed, particularly in dev/test environments or when deployment pipelines fail to include proper cleanup logic. Over time, the accumulation of unused load balancers contributes to unnecessary recurring costs with no operational benefit.
Each Lambda function must be configured with a memory setting, which indirectly controls the amount of CPU and networking performance allocated. In many environments, memory settings are defined arbitrarily or left unchanged as functions evolve. Over time, this leads to overprovisioning — with functions running well below their allocated memory and incurring unnecessary compute costs. Systematic right-sizing using performance benchmarks can significantly reduce spend without sacrificing performance or reliability. This is especially important for frequently invoked functions or those with long execution times.