Skip to main content

Further Optimization

Overview

This chapter applies a set of low-level host optimisations that improve performance, stability, and resource efficiency on a Proxmox node. All changes are persistent across reboots. A single reboot at the end is sufficient to activate everything.

1 — sysctl Parameters

These values aren't magic numbers — they're defaults I've settled on after running Proxmox in production for a while. The swappiness value of 10 is a judgment call; some people go lower (2–5) on dedicated hypervisors with no swap at all. I keep it at 10 because it gives the kernel enough flexibility without making swap the first resort under normal load spikes.

All kernel parameter tuning is consolidated into a single file at /etc/sysctl.d/99-proxmox-optimize.conf. This keeps the optimisations isolated from system defaults and easy to review or revert.

Create the file with the following content:

vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
vm.max_map_count = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
ParameterValueReason
vm.swappiness10Default is 60 — reduces swapping to near-last-resort so RAM is used fully before swap is touched
vm.dirty_ratio15Caps dirty (unwritten) pages at 15% of RAM; prevents large write stalls under burst I/O
vm.dirty_background_ratio5Starts background flushing at 5% — proactive writeout keeps the dirty buffer from building up
vm.max_map_count262144Raises the limit on memory-mapped regions; required by some containerised workloads such as Elasticsearch and certain databases
net.core.rmem_max16777216Sets maximum socket receive buffer to 16MB — important for high-throughput inter-VM and backup traffic
net.core.wmem_max16777216Sets maximum socket send buffer to 16MB
net.ipv4.tcp_rmem4096 87380 16777216TCP receive buffer min/default/max — allows connections to ramp up to 16MB under load
net.ipv4.tcp_wmem4096 65536 16777216TCP send buffer min/default/max

Apply immediately without rebooting:

sysctl -p /etc/sysctl.d/99-proxmox-optimize.conf

2 — CPU Governor

The CPU frequency governor controls how the processor scales its clock speed. The default powersave or ondemand governor reduces frequency when the CPU appears idle — on a hypervisor this causes latency spikes the moment a VM wakes up and needs full CPU speed. Setting the governor to performance keeps all cores at maximum frequency at all times.

Install the CPU power management tools:

apt-get install -y linux-cpupower

Set the governor immediately:

cpupower frequency-set -g performance

Create /etc/systemd/system/cpu-governor.service to persist it across reboots:

[Unit]
Description=Set CPU governor to performance
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/bin/cpupower frequency-set -g performance
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Enable the service:

systemctl daemon-reload
systemctl enable cpu-governor.service

Verify:

cpupower frequency-info | grep "The governor"

3 — Transparent Huge Pages

Disabling THP is one of those recommendations that sounds optional until you run a database or a JVM workload inside a VM and start seeing random latency spikes you can't explain. The compaction background thread is the culprit — it runs at the worst times. This is a zero-cost change that removes an entire class of hard-to-diagnose performance problems.

Transparent Huge Pages (THP) automatically promotes standard 4KB memory pages into 2MB huge pages to reduce TLB pressure. On a hypervisor it causes the opposite effect: latency spikes when the kernel compacts memory in the background, unpredictable guest performance, and memory allocation failures under load. Disabling THP is standard practice on any hypervisor.

Open /etc/default/grub and update GRUB_CMDLINE_LINUX_DEFAULT:

GRUB_CMDLINE_LINUX_DEFAULT="quiet transparent_hugepage=never"

If you are also adding IOMMU parameters (next section), combine them on the same line — do not duplicate the directive.

Apply:

update-grub

4 — I/O Scheduler

The I/O scheduler controls how the kernel queues and reorders disk requests. NVMe drives have their own internal queuing hardware and perform best with the none scheduler — bypassing kernel-level queuing entirely. SATA SSDs benefit from mq-deadline, which adds minimal reordering to ensure requests are served within a time deadline.

Create /etc/udev/rules.d/60-io-scheduler.rules:

ACTION=="add|change", KERNEL=="nvme[0-9]*n[0-9]*", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"

Apply without rebooting:

udevadm control --reload-rules
udevadm trigger

Verify the active scheduler:

cat /sys/block/nvme0n1/queue/scheduler

The active scheduler is shown in brackets — e.g. [none] mq-deadline kyber bfq.

5 — IOMMU

IOMMU enables the CPU to isolate and control DMA access from PCIe devices. On a hypervisor it is the foundation for PCI passthrough — assigning physical devices (GPUs, NICs, NVMe controllers) directly to VMs with no driver layer in between. Enabling it now costs nothing and makes the option available when needed without another GRUB change and reboot later.

The iommu=pt flag enables passthrough mode — IOMMU protection is applied only to devices that explicitly request it, avoiding performance overhead on all other devices.

Check your CPU vendor:

lscpu | grep "Vendor ID"

Open /etc/default/grub and update GRUB_CMDLINE_LINUX_DEFAULT with the THP flag from the previous section combined with the appropriate IOMMU flag:

Intel:

GRUB_CMDLINE_LINUX_DEFAULT="quiet transparent_hugepage=never intel_iommu=on iommu=pt"

AMD:

GRUB_CMDLINE_LINUX_DEFAULT="quiet transparent_hugepage=never amd_iommu=on iommu=pt"

Apply:

update-grub

Verify after reboot:

dmesg | grep -e IOMMU -e DMAR | head -10

6 — Journal Size

systemd’s journal accumulates logs with no default size cap. On a long-running server this grows to fill available disk space. Capping it to 1GB is sufficient for debugging while keeping the filesystem healthy.

Edit /etc/systemd/journald.conf and add under [Journal]:

[Journal]
SystemMaxUse=1G

Apply immediately:

systemctl restart systemd-journald
journalctl --disk-usage

The journal size cap is the change that surprises people the most — it's not a performance optimization, it's maintenance hygiene. I've inherited servers where the journal had consumed 40GB on a root partition that had 50GB total. The cap won't help if you're already there, but it prevents the problem from developing on a fresh node.

7 — Reboot and Verify

The sysctl parameters are already active. Everything else takes effect on the next boot.

reboot

After reconnecting, verify each change:

Transparent Huge Pages disabled:

cat /sys/kernel/mm/transparent_hugepage/enabled

Should output: always madvise [never]

CPU governor set to performance:

cpupower frequency-info | grep "The governor"

Should output: The governor "performance" ...

I/O scheduler active:

cat /sys/block/nvme0n1/queue/scheduler

Should output: [none] ...

IOMMU enabled:

dmesg | grep -e IOMMU -e DMAR | head -5

Should show IOMMU enabled with your CPU vendor.