Skip to content
S sufi.my
Back to Blog

Article

ELI5: Google Cloud Logging for Debugging

How cloud logs help trace issues across backend services and transactions.

January 3, 2026 · 11 min read

What problem does it solve?

When something fails in production, logs help answer what happened, where, and when. In a distributed system with multiple services, a single request might pass through an API gateway, a backend service, a payment processor, and a database. Without centralized logging, you’d need to SSH into each server and search files manually.

Google Cloud Logging solves this by aggregating logs from all services into one searchable interface. Instead of checking five different servers, you run a single query and see the complete picture of what happened across your entire system.

How GCP Logging works

Google Cloud Logging collects logs from all your services into one centralized location. Here’s how logs flow through the system:

  1. Log sources (your applications, Google Cloud services, third-party tools) emit log entries
  2. Google Cloud Logging API receives and processes these entries
  3. Cloud Logging console stores and indexes logs (retained for 30 days by default, longer with custom buckets)
  4. Logs Explorer provides search and filtering across all indexed logs

Log flow architecture

When you deploy an application on Cloud Run, App Engine, Compute Engine, or GKE, Google automatically collects:

  • Standard output (stdout) and standard error (stderr)
  • Application logs sent via the Cloud Logging client library
  • Cloud audit logs (who did what and when)
  • Service metrics and events

You can filter by severity, timestamp, service name, resource type, or any custom structured fields you add to your logs.

Structured vs unstructured logging

Unstructured logging (traditional):

logger.info("Payment processed for user " + userId + " with amount " + amount);

Problems: Hard to parse, impossible to query by specific fields, error-prone string concatenation.

Structured logging (using SLF4J + Logback with JSON output):

// Spring Boot service with structured logging
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;

Logger logger = LoggerFactory.getLogger(PaymentService.class);

// Using MDC (Mapped Diagnostic Context) for structured fields
MDC.put("transactionId", txnId);
MDC.put("amount", String.valueOf(amount));
MDC.put("status", "SUCCESS");
MDC.put("processingTimeMs", "245");
logger.info("Payment processed");
MDC.clear();

Advantages:

  • Query by exact field value: jsonPayload.amount > 1000
  • Aggregations: group logs by jsonPayload.status and count
  • Type-safe: numeric fields sort numerically, not alphabetically
  • Better performance: Cloud Logging indexes structured fields

Spring Boot logging configuration

To configure structured logging in Spring Boot with Google Cloud, use the Spring Cloud GCP Logging starter:

<!-- pom.xml dependencies -->
<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>spring-cloud-gcp-starter-logging</artifactId>
</dependency>
<dependency>
    <groupId>net.logstash.logback</groupId>
    <artifactId>logstash-logback-encoder</artifactId>
    <version>7.4</version>
</dependency>

The logstash-logback-encoder provides StructuredArguments.keyValue() which attaches structured key-value pairs to log entries. When combined with the GCP logging appender, these fields become queryable jsonPayload entries in Cloud Logging.

# application.properties
spring.application.name=payment-service
logging.level.root=INFO
logging.level.com.mycompany=DEBUG
<!-- logback-spring.xml - configure JSON structured output -->
<configuration>
    <include resource="com/google/cloud/logging/logback/base.xml" />
    <root level="INFO">
        <appender-ref ref="CLOUD" />
    </root>
</configuration>

When deployed on Cloud Run or GKE, the spring-cloud-gcp-starter-logging library formats logs as JSON with the correct fields that Cloud Logging indexes automatically. MDC fields become part of jsonPayload, making them queryable in the Logs Explorer.

Log severity levels

Understanding when to use each severity level is critical for effective alerting and debugging:

DEBUG

  • Use when: Detailed diagnostic information for developers
  • Examples: Variable values, function entry/exit, cache hits/misses
  • Noise level: High (generates lots of logs)
  • Example:
logger.debug("User cache lookup",
    StructuredArguments.keyValue("userId", userId),
    StructuredArguments.keyValue("cacheHit", true));

INFO

  • Use when: Important events in normal application flow
  • Examples: Service startup, request completion, data import finished
  • Noise level: Medium (usually a few per second in normal operation)
  • Example:
logger.info("Payment processed",
    StructuredArguments.keyValue("transactionId", txnId),
    StructuredArguments.keyValue("amount", amount),
    StructuredArguments.keyValue("status", "SUCCESS"));

WARNING

  • Use when: Something unexpected but recoverable happened
  • Examples: Retry attempt, deprecated API used, slow query detected
  • Noise level: Low (shouldn’t happen frequently)
  • Example:
logger.warn("Database query slow",
    StructuredArguments.keyValue("query", "SELECT..."),
    StructuredArguments.keyValue("durationMs", 5432));

ERROR

  • Use when: Something failed and caused an operation to abort
  • Examples: Failed database write, payment processor returned error, validation failed
  • Noise level: Very low (should trigger investigation)
  • Example:
logger.error("Payment processing failed",
    StructuredArguments.keyValue("transactionId", txnId),
    StructuredArguments.keyValue("reason", "insufficient_funds"),
    StructuredArguments.keyValue("errorCode", "FUND_ERROR_001"),
    new Exception("Payment declined"));

CRITICAL

  • Use when: System is in dangerous state or likely to fail
  • Examples: Database unreachable, out of memory, all servers down
  • Noise level: Extremely low (rare and requires immediate attention)

Note: SLF4J has no CRITICAL level — its highest is ERROR. In Cloud Logging, CRITICAL is a separate severity. To emit CRITICAL-level logs, use the Cloud Logging API directly or configure a custom severity mapping in your Logback appender. In most Spring Boot applications, ERROR is sufficient.

  • Example:
logger.error("Database connection pool exhausted",
    StructuredArguments.keyValue("activeConnections", 100),
    StructuredArguments.keyValue("maxConnections", 100),
    StructuredArguments.keyValue("waitingRequests", 25));

Filtering logs in practice

The real power is in filtering. When a user reports a failed payment, you can trace the entire flow using the Logs Explorer query syntax:

Basic filtering by field and value

resource.type="cloud_run_revision"
jsonPayload.transactionId="TXN-20260103-00421"

Filtering by severity

severity>=WARNING

This includes WARNING, ERROR, and CRITICAL level logs.

Time-based queries

timestamp>="2026-01-03T10:00:00Z"
timestamp<"2026-01-03T11:00:00Z"

Combined query for tracing a specific transaction

resource.type="cloud_run_revision"
jsonPayload.transactionId="TXN-20260103-00421"
severity>=WARNING
timestamp>="2026-01-03T10:00:00Z"

This query finds all warnings and errors for a specific transaction across every service that logged it. The results show:

  • Exact timestamp when each event occurred
  • Which service logged it (resource.name)
  • The complete message and all structured fields
  • Stack traces for errors

Advanced filtering with regex

jsonPayload.email=~"^[a-z]+@example.com$"

Filtering by resource type and labels

resource.type="cloud_run_revision"
resource.labels.service_name="payment-service"
resource.labels.revision_name="payment-service-abc123"

Excluding logs (NOT queries)

severity=ERROR
-jsonPayload.errorCode="EXPECTED_TIMEOUT"

This finds all errors except expected timeouts, useful for filtering out known false alarms.

Correlating across services

The key technique is correlation IDs (also called request IDs or trace IDs). Every request gets a unique ID that travels through all services. When something fails, you search for that ID and see the complete journey:

  1. API Gateway received request (200ms)
  2. Auth service validated token (50ms)
  3. Payment service called Razorpay (800ms)
  4. Database write failed (timeout after 5000ms)

Without correlation IDs, finding this chain manually would take hours. With them, it takes seconds.

Implementing correlation IDs

import java.util.UUID;

// In your API gateway or entry point
@RestController
public class PaymentController {

    @PostMapping("/payment")
    public ResponseEntity<?> processPayment(@RequestBody PaymentRequest req) {
        // Generate unique correlation ID for this request
        String correlationId = UUID.randomUUID().toString();

        // Add to request context so all downstream services can access it
        MDC.put("correlationId", correlationId);

        logger.info("Payment request received",
            StructuredArguments.keyValue("correlationId", correlationId),
            StructuredArguments.keyValue("userId", req.userId),
            StructuredArguments.keyValue("amount", req.amount));

        // Pass correlationId through service calls
        paymentService.process(req, correlationId);

        return ResponseEntity.ok().build();
    }
}

// In downstream services
public class PaymentService {
    public void process(PaymentRequest req, String correlationId) {
        logger.info("Payment processing started",
            StructuredArguments.keyValue("correlationId", correlationId),
            StructuredArguments.keyValue("status", "INITIATED"));

        try {
            // Make external call
            razorpayClient.charge(req.amount, correlationId);

            logger.info("Payment completed",
                StructuredArguments.keyValue("correlationId", correlationId),
                StructuredArguments.keyValue("status", "SUCCESS"));
        } catch (Exception e) {
            logger.error("Payment failed",
                StructuredArguments.keyValue("correlationId", correlationId),
                StructuredArguments.keyValue("status", "FAILED"),
                StructuredArguments.keyValue("error", e.getMessage()),
                e);
        }
    }
}

Querying with correlation ID

Once correlation IDs are in all logs:

jsonPayload.correlationId="550e8400-e29b-41d4-a716-446655440000"

This single query returns logs from API gateway, auth service, payment service, database—everything that touched this request, in chronological order.

Log-based metrics and alerting

Google Cloud Logging can automatically create metrics from your logs and trigger alerts. This is more flexible than application metrics because you can define metrics after deployment without code changes.

Creating a log-based metric

From the Logs Explorer, you can convert any filter into a metric:

severity=ERROR
resource.type="cloud_run_revision"

This creates a metric that counts the number of errors from Cloud Run. You can then:

  • Graph error rates over time
  • Set an alert: “If errors > 10 per minute, send Slack notification”
  • Create a dashboard showing error trends

Example: Alert on failed payments

Query:

jsonPayload.status="FAILED"
resource.type="cloud_run_revision"
resource.labels.service_name="payment-service"

Alert policy:

  • Metric: Count of logs matching above query
  • Threshold: > 5 errors in 5 minutes
  • Action: Send Slack notification to #payments-team

This lets on-call engineers know within minutes when something is broken, instead of waiting for customer reports.

Combining with Cloud Monitoring

Log-based metrics integrate with Google Cloud’s monitoring and alerting:

# Cloud Monitoring alert policy (in Terraform)
resource "google_monitoring_alert_policy" "payment_errors" {
  display_name = "High payment error rate"
  conditions {
    display_name = "Error rate > 5/min"
    condition_threshold {
      filter = <<-EOT
        resource.type="cloud_run_revision"
        jsonPayload.status="FAILED"
      EOT
      comparison = "COMPARISON_GT"
      threshold_value = 5
      duration = "300s"
    }
  }
  notification_channels = [google_monitoring_notification_channel.slack.id]
}

Common mistakes and how to avoid them

1. Logging sensitive data

Mistake:

// DON'T DO THIS
logger.info("User payment info",
    StructuredArguments.keyValue("cardNumber", "4532-1111-2222-3333"),
    StructuredArguments.keyValue("cvv", "123"));

Logs are stored for 30 days and may be accessed by support staff. Logging credit cards, passwords, or API keys violates PCI-DSS and exposes credentials.

Better approach:

// DO THIS INSTEAD
logger.info("User payment processed",
    StructuredArguments.keyValue("paymentMethod", "credit_card"),
    StructuredArguments.keyValue("cardLastFour", "3333"),
    StructuredArguments.keyValue("amount", amount),
    StructuredArguments.keyValue("status", "SUCCESS"));

Log only the minimum identifying information (last 4 digits, payment method type, hashed user ID if needed).

2. Over-logging (log noise)

Mistake:

// DON'T DO THIS - logs inside a loop with high frequency
for (User user : users) {
    logger.debug("Processing user",
        StructuredArguments.keyValue("userId", user.id));
}

If you have 1 million users, this creates 1 million log entries for a single batch job, costing money and making queries slow.

Better approach:

// DO THIS INSTEAD - log summaries, not every iteration
long startTime = System.currentTimeMillis();
int processedCount = 0;

for (User user : users) {
    processedCount++;
}

logger.info("Batch processing completed",
    StructuredArguments.keyValue("totalUsers", processedCount),
    StructuredArguments.keyValue("durationMs", System.currentTimeMillis() - startTime));

For high-frequency events, log aggregated summaries instead of individual entries.

3. Not using structured fields for queryable data

Mistake:

// DON'T DO THIS - queryable data in free text
logger.info("Payment " + status + " for amount " + amount);

You can’t query “show me all payments over $1000” because the amount is text in a message.

Better approach:

// DO THIS INSTEAD - use structured fields
logger.info("Payment processed",
    StructuredArguments.keyValue("status", status),
    StructuredArguments.keyValue("amount", amount));  // numeric field

Now you can query: jsonPayload.amount > 1000

4. Ignoring time zones

Mistake:

logger.info("Transaction at " + System.currentTimeMillis());  // milliseconds since epoch

This is hard to read when viewing logs manually. Timestamps should be ISO 8601 format.

Better approach:

// Spring/SLF4J automatically includes timestamp in ISO 8601 format
// 2026-01-03T14:32:45.123Z
logger.info("Transaction completed",
    StructuredArguments.keyValue("durationMs", duration));

Google Cloud Logging stores all timestamps in UTC. Filtering by time is more reliable.

5. Missing error context

Mistake:

// DON'T DO THIS - generic error message
try {
    razorpayClient.charge(amount);
} catch (Exception e) {
    logger.error("Payment failed", e);
}

No context about what was attempted or what values caused the failure.

Better approach:

// DO THIS INSTEAD - include context
try {
    razorpayClient.charge(amount);
} catch (Exception e) {
    logger.error("Payment charge failed",
        StructuredArguments.keyValue("correlationId", correlationId),
        StructuredArguments.keyValue("userId", userId),
        StructuredArguments.keyValue("amount", amount),
        StructuredArguments.keyValue("errorCode", extractErrorCode(e)),
        StructuredArguments.keyValue("errorMessage", e.getMessage()),
        e);  // Include exception for stack trace
}

With context, you can search jsonPayload.errorCode="INSUFFICIENT_FUNDS" AND jsonPayload.amount > 5000 to find a specific class of failures.

6. Not using correlation IDs consistently

Mistake: Some services log with correlation ID, others don’t.

Result: You can trace a request partway through the system, then it disappears. Impossible to debug full request flow.

Better approach: Make correlation ID mandatory in your logging framework:

// Middleware that enforces correlation ID on every request
@Component
public class CorrelationIdFilter extends OncePerRequestFilter {

    @Override
    protected void doFilterInternal(HttpServletRequest request,
            HttpServletResponse response, FilterChain filterChain)
            throws ServletException, IOException {

        String correlationId = request.getHeader("X-Correlation-ID");
        if (correlationId == null) {
            correlationId = UUID.randomUUID().toString();
        }

        // Make available to all downstream code
        MDC.put("correlationId", correlationId);
        response.setHeader("X-Correlation-ID", correlationId);

        try {
            filterChain.doFilter(request, response);
        } finally {
            MDC.clear();
        }
    }
}

Now every log automatically includes the correlation ID without explicit code.

Why it matters

This is how I traced payment issues in real incident work. When users reported missing transactions, structured logging and correlation IDs let me pinpoint the exact failure point in minutes instead of hours.

The difference between “something broke somewhere” and “Database write timed out at 2026-01-03T14:23:45Z in payment-service for transaction TXN-00421” is structured logging. It’s the difference between hours of debugging and minutes of resolution.

Good logging practices also compound over time—after a month of structured logs with correlation IDs, you have a treasure trove of production data to analyze performance bottlenecks, identify common failure patterns, and prove the impact of optimizations.