Making room: right-sizing live VMs

The cluster needed real memory: a control-plane node, three workers at 24 GiB each, a build runner, and later three database VMs. Adding all of that naively would have blown past physical RAM.

So before adding, I went looking for slack in what already existed. The original "core" service VMs were each handed 4 GiB out of habit. A quick look at what they actually used told a different story:

VM                used (real)     allocated
Git / Docs / ...   ~0.5-0.7 GiB     4 GiB     <- mostly page cache, not need

Those services were sitting on ~0.5 GiB of genuine usage with 4 GiB allocated. The "used" number in the hypervisor looked high only because Linux fills spare RAM with disk cache — which is not memory the VM needs. Reading the real figure (free/available, not used-including-cache) showed each could drop to 2 GiB with room to spare. That freed ~12 GiB.

Later, the three Kubernetes workers were trimmed from 24 to 22 GiB each — 6 GiB — to fund the first two database servers, and then trimmed again from 22 to 20 GiB to fund a third (a dedicated MySQL box). At 16 vCPU / 20 GiB that's 1.25 GiB per core, which is about as lean as I'd take these workers. Because these VMs have ballooning disabled, "trimming" means a real stop/start, so each round was done as a rolling operation (drain a node, resize it, bring it back, move to the next) to keep the cluster healthy throughout.

Lesson learned: "used memory" lies. On Linux, free RAM is wasted RAM, so the OS caches aggressively and the used figure looks scary. Size against available memory and actual working set, not the headline number. Half this lab's growth was paid for by reclaiming allocation that was never really in use.