Lesson: Loki, Mimir & Tempo
What you'll learn
- What Loki does for logs, and how to query it with LogQL.
- What Mimir does for metrics, and the idea of querying with PromQL.
- What Tempo does for traces, and how spans form a trace.
- How Grafana ties all three together so you can pivot logs ↔ metrics ↔ traces.
- Where the lab Loki lives (10.100.100.5) and how data reaches it.
By the end you'll be able to read and write basic LogQL and PromQL and explain what each backend is for.
The lesson
1. The LGTM backends
Grafana Labs builds three open-source backends — one per pillar. Together with Grafana they're nicknamed the LGTM stack:
L oki -> Logs (text events)
G rafana -> the UI / glue
T empo -> Traces (request paths)
M imir -> Metrics (numbers over time)
All three are built to be cheap and scalable: they store the bulk of data in plain object storage (like MinIO / S3) and keep only small indexes hot. Grafana queries all three and lets you jump between them. The lab runs a central Loki at 10.100.100.5 with 365-day retention; the wider stack adds Mimir for metrics at scale and Tempo for traces.
2. Loki — logs, "like Prometheus but for logs"
Loki stores logs without indexing the full text. Instead it indexes only a small set of labels (job, host, namespace, level), and keeps the log lines themselves compressed in object storage. This makes it cheap. The trade-off: you select streams by label first, then filter the text.
A log stream is the unique combination of labels, e.g. {job="myapp", host="web-01"}. Querying is done with LogQL.
STREAM SELECTOR LINE FILTER PARSER + LABEL FILTER
{job="myapp"} |= "error" | logfmt | level="error"
^ pick streams ^ keep lines ^ extract fields, filter on them
with "error"
LogQL examples:
# All error lines from myapp in the last range
{job="myapp"} |= "error"
# Lines NOT containing "healthcheck"
{job="myapp"} != "healthcheck"
# Parse logfmt fields, keep level=error, from one host
{job="myapp", host="web-01"} | logfmt | level="error"
# A METRIC from logs: error lines per second per service (5m windows)
sum by (service) (rate({job="myapp"} |= "error" [5m]))
That last query is powerful: LogQL can turn logs into a numeric graph you can put on a dashboard or alert on — even without separate metrics.
3. Mimir — metrics at scale
Mimir is a long-term, horizontally scalable store for Prometheus-style metrics. Prometheus is the de-facto standard for metrics; Mimir speaks the same data model and the same query language, PromQL, but is built to hold years of data across many teams. (In the lab, metrics live in InfluxDB today; Mimir is the scale-out option in the wider stack — the concepts below apply to both.)
Metrics come in types: counters (only go up, e.g. total requests), gauges (go up and down, e.g. memory in use), and histograms (distributions, e.g. request durations bucketed for percentiles).
PromQL examples:
# Per-second request rate over 5m windows (rate of a counter)
rate(http_requests_total[5m])
# Total 5xx error rate across all instances
sum(rate(http_requests_total{status=~"5.."}[5m]))
# Error ratio (errors / total) as a fraction
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))
# 95th-percentile latency from a histogram
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
The key idea: rate(counter[window]) converts an ever-increasing counter into a per-second rate, and sum/by aggregate across instances and labels. These few patterns cover most day-to-day metric questions.
4. Tempo — traces
Tempo stores distributed traces. A trace records one request's journey across services; it is a tree of spans, where each span is a single operation with a start time and duration. Spans share a trace ID and link to their parent span.
trace_id=abc123 (total 3.9s)
|
+- order-api [#####] 12 ms
| |
| +- payment-svc [#############......] 3800 ms <-- the bottleneck
| | |
| | +- db query [###] 150 ms
| +- email-svc [#] 8 ms
Reading that, you instantly see where the time went — payment-svc dominated. Tempo is optimized to fetch a trace by its ID very cheaply. You usually arrive at a trace from a metric (latency alert) or a log line (which carries the trace_id), then open it in Tempo to find the slow span. Then you read that service's logs in Loki for the same time window to learn why.
5. How Grafana queries them
In Grafana each backend is a data source with its own query editor:
- Point a panel at Loki → you get the LogQL editor.
- Point it at Mimir/Prometheus → the PromQL editor.
- Point it at Tempo → a trace search and the waterfall trace view.
The magic is the pivots Grafana wires between them:
- Logs → Traces: if a Loki line contains a
trace_id, Grafana shows a button to open that trace in Tempo (a "derived field"). - Traces → Logs: from a Tempo span, jump to the logs for that service/time.
- Metrics → Logs/Traces: from a latency spike on a graph, jump to the matching logs or exemplar traces.
This is the observability workflow from Lesson 1 made literal: metric (Mimir) → trace (Tempo) → log (Loki), all without leaving Grafana.
6. The shared design: cheap storage, label index
Notice the common pattern across all three: store the bulk in object storage, index only labels/IDs. This is why they scale and why labels matter so much. Choose labels that are useful for filtering (service, env, host) and low cardinality; never put unbounded values (user IDs, request IDs, raw URLs) in labels — those go in the log body or as trace attributes. Get this right and queries stay fast and storage stays cheap.
7. Putting it together in the lab
Right now the lab ships logs from every host into Loki at 10.100.100.5 (via Promtail today, Alloy as the modern path) and metrics into InfluxDB, all viewed in Grafana at 10.100.100.4. In the capstone you'll add your app to this picture: its logs land in Loki, you query them with LogQL, you build a metrics dashboard, and you alert on it — the full LGTM loop on your own service.
Dig deeper
- Grafana Loki — LogQL documentation
- Grafana Mimir documentation
- Prometheus — Querying basics (PromQL)
- Grafana Tempo documentation
- Grafana — Trace to logs / derived fields
Search terms
logql tutorial line filter parser examplespromql rate counter gauge histogram_quantilegrafana tempo trace spans waterfallloki labels cardinality best practicesLGTM stack loki mimir tempo grafanagrafana trace to logs derived fields
Check yourself
- How does Loki keep log storage cheap, and what is the trade-off in how you query it?
- Write a LogQL query for all
errorlines from{job="myapp"}on hostweb-01. - In PromQL, what does
rate(http_requests_total[5m])compute, and why do you wrap a counter inrate()? - What is a span, and how do spans relate to a trace?
- Describe the metric → trace → log pivot and which backend serves each step.
No comments to display
No comments to display