Submit feedback on
Excessive Model Logging Enabled in Production Environments
We've received your feedback.
Thanks for reaching out!
Oops! Something went wrong while submitting the form.
Close
Excessive Model Logging Enabled in Production Environments
CER:
GCP-AI-2074
Service Category
AI
Cloud Provider
GCP
Service Name
GCP Vertex AI
Inefficiency Type
Excessive Logging Configuration
Explanation

Verbose logging is useful during development, but many teams forget to disable it before deploying to production. Generative AI workloads often include long prompts, large multi-paragraph outputs, embedding vectors, and structured metadata. When these full payloads are logged on high-throughput production endpoints, Cloud Logging costs can quickly exceed the cost of the model inference itself. This inefficiency commonly arises when development-phase logging settings carry into production environments without review.

Relevant Billing Model

Cloud Logging charges per ingested GiB. Generative AI requests often contain large prompts and outputs, so logging full payloads—especially at scale—can generate substantial ingestion cost unrelated to model inference.

Detection
  • Identify production Vertex AI endpoints generating unusually high Cloud Logging ingestion
  • Review logging settings to confirm whether full request and response bodies are captured
  • Inspect logs for large text prompts, long responses, or embeddings
  • Look for services where logging was never reduced after testing or model iteration phases
  • Compare log volume vs. inference volume to detect disproportionate ingestion
Remediation
  • Disable full payload logging for production endpoints unless explicitly required
  • Log only minimal request metadata (e.g., status, latency) rather than full bodies
  • Use sampling or partial logging strategies to reduce ingestion volume
  • Limit verbose logging to short-term debugging in dev/test environments
  • Periodically audit Cloud Logging ingestion tied to AI endpoints
Relevant Documentation
Submit Feedback