Production Observability with Spring Boot 4, OpenTelemetry, and Grafana
Twenty-eight chapters on instrumenting Spring Boot services with the full OpenTelemetry and Grafana stack. Covers the OTel Java agent, custom metrics, structured logging, distributed tracing, sampling strategies, PromQL, LogQL, TraceQL, SLOs, error budgets, and deploying the observability stack in Kubernetes. Built on a two-service CineTrack setup.
Coming Soon
What you'll learn
Instrumenting Spring Boot 4 services with the OpenTelemetry Java agent
Custom metrics, cardinality traps, and histogram design for real workloads
Structured logging at scale with correlation IDs across service boundaries
Manual spans and trace context propagation for non-standard integration points
Tail sampling and head sampling strategies for high-volume production services
PromQL, LogQL, and TraceQL for real production queries
Dashboards and alerts that reflect SLOs rather than infrastructure noise
Error budgets, on-call readiness, and preventing alert fatigue
Deploying, scaling, and tuning the Grafana observability stack in Kubernetes
Table of Contents
Getting Started
- 01 When the Lights Go Out
- 02 The Three Pillars
- 03 Meet the Stack
- 04 CineTrack Goes Live
- 05 The OTel Java Agent
- 06 Your First Metrics
- 07 Your First Logs
- 08 Your First Traces
- 09 Connecting the Dots
Going Deeper
- 10 Custom Metrics
- 11 Cardinality
- 12 Structured Logging at Scale
- 13 Manual Spans
- 14 Sampling
- 15 Instrumentation Patterns
Querying and Dashboards
- 16 PromQL in Practice
- 17 LogQL in Practice
- 18 TraceQL in Practice
- 19 Dashboards That Help
- 20 The Production Dashboard
Alerting and Reliability
- 21 Alertmanager
- 22 Writing Alerts That Don't Lie
- 23 SLOs and Error Budgets
- 24 On-Call Ready
Scale and Operations
- 25 Grafana Alloy
- 26 Kubernetes Deployment
- 27 Retention and Scaling
- 28 Performance Tuning the Stack