Excessive Model Logging Enabled in Production Environments

CER:

CER-0242

Service Category

Cloud Provider

GCP

Service Name

GCP Vertex AI

Inefficiency Type

Excessive Logging Configuration

Explanation

Verbose logging is useful during development, but many teams forget to disable it before deploying to production. Generative AI workloads often include long prompts, large multi-paragraph outputs, embedding vectors, and structured metadata. When these full payloads are logged on high-throughput production endpoints, Cloud Logging costs can quickly exceed the cost of the model inference itself. This inefficiency commonly arises when development-phase logging settings carry into production environments without review.

Relevant Billing Model

Cloud Logging charges per ingested GiB. Generative AI requests often contain large prompts and outputs, so logging full payloads—especially at scale—can generate substantial ingestion cost unrelated to model inference.

Detection

Identify production Vertex AI endpoints generating unusually high Cloud Logging ingestion
Review logging settings to confirm whether full request and response bodies are captured
Inspect logs for large text prompts, long responses, or embeddings
Look for services where logging was never reduced after testing or model iteration phases
Compare log volume vs. inference volume to detect disproportionate ingestion

Remediation

Disable full payload logging for production endpoints unless explicitly required
Log only minimal request metadata (e.g., status, latency) rather than full bodies
Use sampling or partial logging strategies to reduce ingestion volume
Limit verbose logging to short-term debugging in dev/test environments
Periodically audit Cloud Logging ingestion tied to AI endpoints

Relevant Documentation

Submit Feedback