Cluster Storage with NFS (CSI)
Dynamic ReadWriteMany volumes for the cluster: a dedicated NFS server plus the NFS CSI driver wired up as the default StorageClass.
- Why NFS for cluster storage
- The NFS server
- The CSI driver and the default StorageClass
- Proving it actually works
- Lessons on lab storage
Why NFS for cluster storage
Pods are disposable; their data shouldn't be. Kubernetes solves that with PersistentVolumes — storage that outlives the pod using it. The question is what backs those volumes.
For a lab, NFS is a pragmatic, honest choice:
- It's ReadWriteMany — the same volume can be mounted by several pods at once, on different nodes. A lot of simpler options (hostPath, local disks) are tied to one node; NFS isn't.
- It's dead simple to reason about — it's just a directory on a server, exported over the network. When something looks wrong, you SSH to the NFS box and
lsthe directory. - It needs no special hardware — no Ceph cluster, no cloud block-storage driver.
The tradeoff: NFS is not the fastest, and it's a single server (a single point of failure). For databases I deliberately don't use it — those get their own VMs with local disk (see the Data book). But for general application volumes in a learning environment, NFS is exactly enough.
Why we use this: match the storage to the job. NFS for shared, general-purpose volumes; local disk for databases that care about latency and fsync semantics. Reaching for one storage system for everything is a common mistake — different workloads genuinely want different backends.
The NFS server
A dedicated VM (K8s-NFS, 10.100.100.12) does one job: export a directory.
K8s-NFS (10.100.100.12)
nfs-kernel-server
export: /srv/nfs/k8s -> 10.100.100.0/24 (rw, no_root_squash)
firewall: allow tcp/2049 from 10.100.100.0/24 (NFSv4)
The export is scoped to the private subnet, so only lab machines can mount it. no_root_squash is enabled because the CSI driver (next page) needs to manage ownership on the subdirectories it creates — a reasonable concession on a trusted network, though it's exactly the kind of thing you'd tighten in production.
Two deliberate decisions:
- NFSv4 only, one port. v4 needs just TCP
2049, which keeps the firewall rule to a single line. (v3 drags in a portmapper and a fistful of random ports — more surface, more to open.) - A whole dedicated VM for it. It would be tempting to fold NFS onto an existing box, but giving storage its own VM means its disk, its load, and its failure domain are cleanly separated.
The Kubernetes nodes just need the nfs-common client package installed so the kubelet can mount NFS volumes. That's part of their baseline.
The CSI driver and the default StorageClass
A bare NFS export is static — you'd have to hand-create a PersistentVolume for every claim. The CSI driver for NFS automates that: when a pod asks for storage, the driver creates a subdirectory on the export and wires up the volume on the fly. That's dynamic provisioning.
Installed with Helm into its own namespace, it adds:
csi-nfs-controller (Deployment) - watches for PersistentVolumeClaims
csi-nfs-node (DaemonSet) - runs on every node, does the mounting
Then a StorageClass ties claims to the NFS server and is marked as the cluster default:
provisioner: nfs.csi.k8s.io
parameters:
server: 10.100.100.12
share: /srv/nfs/k8s
mountOptions: [ nfsvers=4.1 ]
reclaimPolicy: Delete
+ annotation: storageclass.kubernetes.io/is-default-class = true
"Default" means any PVC that doesn't name a class gets this one. So an app author writes a five-line PVC, mentions no storage details at all, and gets a working ReadWriteMany volume. reclaimPolicy: Delete means deleting the claim also removes its subdirectory on the server — tidy for a lab.
Why we use this: the StorageClass is the contract between "I need storage" and "here's how this cluster provides it." Making one the default means application manifests stay portable — they ask for storage generically, and the cluster decides how to satisfy it. That separation is the whole point of the CSI abstraction.
Diagram
Proving it actually works
Storage you haven't tested is a rumour. The check is a throwaway claim:
kubectl apply -f - <<'YAML'
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: nfs-dyn-test }
spec:
accessModes: [ReadWriteMany]
resources: { requests: { storage: 1Gi } }
YAML
No storageClassName — so it should use the default. Within a second it goes Bound, and a matching PersistentVolume appears pointing at 10.100.100.12:/srv/nfs/k8s with a freshly created subdirectory named after the volume. SSH to the NFS box and the directory is right there. Delete the claim and — because of reclaimPolicy: Delete — the subdirectory disappears again.
That round trip (claim → directory created → claim deleted → directory removed) proves the whole chain end to end: the controller saw the claim, the driver talked to the server, the node could mount it, and cleanup works.
Lesson learned: always test dynamic provisioning with a real PVC, and watch the server side too. A claim that goes Bound only proves Kubernetes is happy; SSHing in to see the directory appear and vanish proves the storage is actually doing what you think. Two different layers, both worth confirming.
Lessons on lab storage
- Pick storage per workload. Shared app data → NFS. Databases → local disk on their own VM. One size does not fit all.
- NFSv4, single port, subnet-scoped. Smallest firewall footprint, simplest mental model.
- A default StorageClass keeps manifests clean. Apps shouldn't have to know how storage is provided.
reclaimPolicyis a real decision.Deleteis convenient in a lab;Retainis safer when the data matters and you'd rather clean up by hand.- Single NFS server = single point of failure. Fine for a lab; in production you'd want redundancy or a distributed store. Be honest about that tradeoff rather than pretending NFS is something it isn't.