Overprovisioned Memory Allocation in Cloud Run Services

Service Category

Compute

Cloud Provider

GCP

Service Name

GCP Cloud Run

Inefficiency Type

Overprovisioned Resource Allocation

Explanation

In Cloud Run, each revision is deployed with a fixed memory allocation (e.g., 512MiB, 1GiB, 2GiB, etc.). These settings are often overestimated during initial development or copied from templates. Unlike auto-scaling platforms that adapt instance size based on workload, Cloud Run continues to bill per the allocated amount regardless of actual memory used during execution. If a service consistently uses significantly less memory than allocated, it results in avoidable overpayment per request — especially for high-throughput or long-running services. Since memory and CPU are billed together based on configured values, this inefficiency compounds quickly at scale.

Relevant Billing Model

Billed based on: * Allocated vCPU and memory (GiB) per request * Duration of request execution (per 100ms increment) * Number of requests * Additional charges for egress and requests beyond free tier

Detection

Review actual memory usage per request over a representative time window
Identify services with consistently low memory utilization relative to their configured limits
Evaluate whether higher memory tiers were chosen to solve startup latency or cold start issues that no longer apply
Cross-reference high-throughput services where per-request efficiency has significant cost impact

Remediation

Reconfigure services with right-sized memory allocations aligned to observed usage patterns
Test progressively smaller memory configurations to find a stable baseline without introducing latency or OOM errors
Implement monitoring for memory pressure or failures to validate new settings
Use performance benchmarks and load tests in lower environments before promoting configuration changes to production

Relevant Documentation

Cloud Run Pricing
Cloud Run Resource Allocation

Submit Feedback