Skip to main content

Lessons from sizing this lab

The short version, for anyone doing the same on their own host:

  • Find your binding constraint first. Here it's RAM, every time. CPU overcommits fine; disk is cheap. Plan against the thing that actually runs out.
  • No swap means no second chances. On a hypervisor that's the right setting, but it turns "a bit over budget" into "the OOM killer picks a victim." Keep committed memory honestly under physical.
  • Cap ZFS ARC. Otherwise it's an invisible VM eating your budget.
  • Right-size from real usage, not allocation. Most VMs are over-provisioned out of habit. Reclaiming that is free capacity.
  • Resizing isn't always live. With memory ballooning off, changing a VM's RAM is a stop/start — plan a rolling, drain-aware procedure for anything that's part of a cluster.
  • Watch the second-tightest resource, too. RAM was always the binding constraint — but quietly, thick-provisioned 250 GiB database disks pushed the ZFS pool past 90%. The resource that bites you is usually the one you stopped watching the moment you'd "solved" capacity.
  • Leave a margin. I aimed to keep a few GiB of headroom on the host at all times. The last few percent of RAM is not worth the risk of an OOM cascade.

None of this is exotic. It's just arithmetic done before provisioning instead of after — which is the entire discipline.