Skip to main content

Seeing what the cluster is doing: logs to Loki

The lab already had a central log store (Loki — see its book in Core Infrastructure), with a Promtail agent on every VM shipping system logs. When the cluster arrived, the job was to feed its logs in too.

On the Kubernetes nodes the node-level Promtail agent was extended to ship:

  • Pod logs — tailing /var/log/pods/* and parsing the path so each line is labelled with its namespace, pod, and container. That's the big one: every pod in the cluster, searchable in one place.
  • kubelet and containerd journald units — the node's own story.

And the three database VMs ship their error and slow-query logs, so the slow-query logging turned on during tuning actually lands somewhere you can search it.

k8s nodes ----\
GIT-Runner ----\
PostgreSQL -----+
MariaDB    -----+--> Loki (10.100.100.5) --> Grafana (search & dashboards)
MySQL      -----+
everything else /

One wrinkle worth knowing: a slow-query entry is multi-line — a little block of timing metadata followed by the SQL. Promtail's default is one-line-per-entry, which would shred each slow query into meaningless fragments. So the database log jobs use a multiline stage that re-assembles each query into a single searchable entry. (And each multi-line source gets its own job — never globbed together with single-line logs like the error file, or the multiline rule would swallow those too.)

Why we use this: centralised logs turn "SSH into five boxes and grep" into one query. The moment you have more than two machines, shipping logs to one searchable place stops being nice-to-have. Labelling pod logs by namespace/pod/container, and re-assembling multi-line database logs, is what makes them actually usable instead of just present.