Skip to main content

The mistakes (the honest part)

The lab didn't go in a straight line. The detours taught the most, so here they are, plainly:

  • Worker joins timed out on :6443. I blamed the workers; it was the host firewall on the control plane, plus ufw's default-DROP forwarding policy silently breaking pod networking. Lesson: when a join can't reach the API server, look at the firewall on the server side — and check the FORWARD policy before blaming the CNI.
  • TLS in two places. I first gave Kong its own certificate, then put HAProxy in front of it — and got a redirect loop. Lesson: terminate TLS once; two layers isn't "more secure," it's more moving parts.
  • A MySQL setting on MariaDB. innodb_redo_log_capacity is MySQL's; MariaDB refused to start. And on MySQL itself, a tuning file named 99-… lost to the packaged mysqld.cnf because the include dir loads by filename and 99 sorts before mysqld — renaming it z99-… fixed it. Lesson: they are not the same product — check settings against your actual engine, and check the config load order.
  • Slow-query logs arrived as confetti. I shipped the databases' slow logs to Loki, but a slow query is a multi-line block and Promtail ships one line per entry by default — so each query became a handful of meaningless fragments. (And before that, the agent couldn't even read MySQL's slow log: it's owned by the mysql group, which the agent wasn't in.) Lesson: multi-line logs need a multiline parser and their own job — never globbed with single-line files — and a log you can't read ships nothing, so check permissions first.
  • The Kong admin GUI across two hostnames — CORS pain and double logins, until I moved it to one origin. Lesson: keep a web UI and its API on the same origin.
  • A proxy that ignored its own config because of a stale server-state file. Lesson: when behaviour disagrees with config, suspect cached/persisted state.
  • "Host key changed!" panics after VM reboots regenerated their keys. Lesson: know your environment; a changed key after a reprovision is expected, not an attack.

The mindset: keep a log of what bit you. Every one of these cost time once and zero time forever after, because it got written down. The senior engineer isn't the one who never hits these — it's the one who's hit them before and remembers.