Skip to main content

Joining the workers (and the firewall that blocked them)

Each worker joins with the command kubeadm init printed:

kubeadm join 10.100.100.7:6443 --token <REDACTED> \
  --discovery-token-ca-cert-hash sha256:<REDACTED>

The first time I ran this across the three workers, all three failed with:

couldn't validate the identity of the API Server:
  Get "https://10.100.100.7:6443/.../cluster-info": context deadline exceeded

The workers couldn't reach the control plane on port 6443. The cause: the VMs ship with a host firewall (ufw) enabled, and nothing had opened the Kubernetes ports between nodes.

The fix wasn't to poke a dozen individual port holes. On a trusted, private subnet the clean move is to allow everything from the subnet and fix the forwarding policy:

ufw allow from 10.100.100.0/24
# and, crucially:
#   /etc/default/ufw  ->  DEFAULT_FORWARD_POLICY="ACCEPT"

That second part is the one people miss. Kubernetes (and Calico) move pod traffic through the kernel's FORWARD chain, and ufw defaults that chain to DROP. You can have every port open and still have broken pod networking until the forward policy is ACCEPT. There's also a subtlety the simple per-port approach can't solve: Calico's default encapsulation is IP-in-IP, which is its own IP protocol (not a TCP/UDP port) — "allow from the subnet" covers it; a port list wouldn't.

With that in place, all three workers joined in seconds.

Lesson learned: when a kubeadm join times out talking to :6443, it's almost always the firewall on the control-plane side, not the worker. And when pods can't talk across nodes even though ports look open, check DEFAULT_FORWARD_POLICY before you blame the CNI.