Further Optimization
Overview
This chapter applies a set of low-level host optimisations that improve performance, stability, and resource efficiency on a Proxmox node. All changes are persistent across reboots. A single reboot at the end is sufficient to activate everything.
1 — sysctl Parameters
These values aren't magic numbers — they're defaults I've settled on after running Proxmox in production for a while. The swappiness value of 10 is a judgment call; some people go lower (2–5) on dedicated hypervisors with no swap at all. I keep it at 10 because it gives the kernel enough flexibility without making swap the first resort under normal load spikes.
All kernel parameter tuning is consolidated into a single file at /etc/sysctl.d/99-proxmox-optimize.conf. This keeps the optimisations isolated from system defaults and easy to review or revert.
Create the file with the following content:
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.max_map_count = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
| Parameter | Value | Reason |
|---|---|---|
vm.swappiness | 10 | Default is 60 — reduces swapping to near-last-resort so RAM is used fully before swap is touched |
vm.dirty_ratio | 15 | Caps dirty (unwritten) pages at 15% of RAM; prevents large write stalls under burst I/O |
vm.dirty_background_ratio | 5 | Starts background flushing at 5% — proactive writeout keeps the dirty buffer from building up |
vm.max_map_count | 262144 | Raises the limit on memory-mapped regions; required by some containerised workloads such as Elasticsearch and certain databases |
net.core.rmem_max | 16777216 | Sets maximum socket receive buffer to 16MB — important for high-throughput inter-VM and backup traffic |
net.core.wmem_max | 16777216 | Sets maximum socket send buffer to 16MB |
net.ipv4.tcp_rmem | 4096 87380 16777216 | TCP receive buffer min/default/max — allows connections to ramp up to 16MB under load |
net.ipv4.tcp_wmem | 4096 65536 16777216 | TCP send buffer min/default/max |
Apply immediately without rebooting:
sysctl -p /etc/sysctl.d/99-proxmox-optimize.conf
2 — CPU Governor
The CPU frequency governor controls how the processor scales its clock speed. The default powersave or ondemand governor reduces frequency when the CPU appears idle — on a hypervisor this causes latency spikes the moment a VM wakes up and needs full CPU speed. Setting the governor to performance keeps all cores at maximum frequency at all times.
Install the CPU power management tools:
apt-get install -y linux-cpupower
Set the governor immediately:
cpupower frequency-set -g performance
Create /etc/systemd/system/cpu-governor.service to persist it across reboots:
[Unit]
Description=Set CPU governor to performance
After=multi-user.target
[Service]
Type=oneshot
ExecStart=/usr/bin/cpupower frequency-set -g performance
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Enable the service:
systemctl daemon-reload
systemctl enable cpu-governor.service
Verify:
cpupower frequency-info | grep "The governor"
3 — Transparent Huge Pages
Disabling THP is one of those recommendations that sounds optional until you run a database or a JVM workload inside a VM and start seeing random latency spikes you can't explain. The compaction background thread is the culprit — it runs at the worst times. This is a zero-cost change that removes an entire class of hard-to-diagnose performance problems.
Transparent Huge Pages (THP) automatically promotes standard 4KB memory pages into 2MB huge pages to reduce TLB pressure. On a hypervisor it causes the opposite effect: latency spikes when the kernel compacts memory in the background, unpredictable guest performance, and memory allocation failures under load. Disabling THP is standard practice on any hypervisor.
Open /etc/default/grub and update GRUB_CMDLINE_LINUX_DEFAULT:
GRUB_CMDLINE_LINUX_DEFAULT="quiet transparent_hugepage=never"
If you are also adding IOMMU parameters (next section), combine them on the same line — do not duplicate the directive.
Apply:
update-grub
4 — I/O Scheduler
The I/O scheduler controls how the kernel queues and reorders disk requests. NVMe drives have their own internal queuing hardware and perform best with the none scheduler — bypassing kernel-level queuing entirely. SATA SSDs benefit from mq-deadline, which adds minimal reordering to ensure requests are served within a time deadline.
Create /etc/udev/rules.d/60-io-scheduler.rules:
ACTION=="add|change", KERNEL=="nvme[0-9]*n[0-9]*", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"
Apply without rebooting:
udevadm control --reload-rules
udevadm trigger
Verify the active scheduler:
cat /sys/block/nvme0n1/queue/scheduler
The active scheduler is shown in brackets — e.g. [none] mq-deadline kyber bfq.
5 — IOMMU
IOMMU enables the CPU to isolate and control DMA access from PCIe devices. On a hypervisor it is the foundation for PCI passthrough — assigning physical devices (GPUs, NICs, NVMe controllers) directly to VMs with no driver layer in between. Enabling it now costs nothing and makes the option available when needed without another GRUB change and reboot later.
The iommu=pt flag enables passthrough mode — IOMMU protection is applied only to devices that explicitly request it, avoiding performance overhead on all other devices.
Check your CPU vendor:
lscpu | grep "Vendor ID"
Open /etc/default/grub and update GRUB_CMDLINE_LINUX_DEFAULT with the THP flag from the previous section combined with the appropriate IOMMU flag:
Intel:
GRUB_CMDLINE_LINUX_DEFAULT="quiet transparent_hugepage=never intel_iommu=on iommu=pt"
AMD:
GRUB_CMDLINE_LINUX_DEFAULT="quiet transparent_hugepage=never amd_iommu=on iommu=pt"
Apply:
update-grub
Verify after reboot:
dmesg | grep -e IOMMU -e DMAR | head -10
6 — Journal Size
systemd’s journal accumulates logs with no default size cap. On a long-running server this grows to fill available disk space. Capping it to 1GB is sufficient for debugging while keeping the filesystem healthy.
Edit /etc/systemd/journald.conf and add under [Journal]:
[Journal]
SystemMaxUse=1G
Apply immediately:
systemctl restart systemd-journald
journalctl --disk-usage
The journal size cap is the change that surprises people the most — it's not a performance optimization, it's maintenance hygiene. I've inherited servers where the journal had consumed 40GB on a root partition that had 50GB total. The cap won't help if you're already there, but it prevents the problem from developing on a fresh node.
7 — Reboot and Verify
The sysctl parameters are already active. Everything else takes effect on the next boot.
reboot
After reconnecting, verify each change:
Transparent Huge Pages disabled:
cat /sys/kernel/mm/transparent_hugepage/enabled
Should output: always madvise [never]
CPU governor set to performance:
cpupower frequency-info | grep "The governor"
Should output: The governor "performance" ...
I/O scheduler active:
cat /sys/block/nvme0n1/queue/scheduler
Should output: [none] ...
IOMMU enabled:
dmesg | grep -e IOMMU -e DMAR | head -5
Should show IOMMU enabled with your CPU vendor.
No comments to display
No comments to display