Failover & Recovery Testing
Now the payoff: we kill the primary and watch the cluster heal itself with no human intervention.

Before: note the current primary
redis-cli -p 26379 sentinel get-master-addr-by-name mymaster
# 1) "10.100.100.101" (redis-sv01)
# 2) "6379"
Seed a key so we can prove data survives the failover:
redis-cli -h 10.100.100.101 -a ChangeMe_RedisPass --no-auth-warning set ha:canary before-failover
# OK
Trigger an unplanned failover
Simulate a real server death — hard-stop the primary. On a hypervisor that's a force-stop; on a cloud VM, a power-off. (Stopping the service, sudo systemctl stop redis-server, simulates a process crash and works just as well for the test.)
What happens, in order:
- Each Sentinel stops getting replies from
redis-sv01. - After
down-after-milliseconds(5 s) they mark it subjectively down. - Quorum (2 of 3) agree → objectively down.
- The Sentinels elect a leader (majority, 2 of 3) which runs the failover: it picks the most up-to-date replica, promotes it with
REPLICAOF NO ONE, and reconfigures the other replica to follow the new primary.
After: confirm automatic recovery
Within a few seconds, ask Sentinel again:
redis-cli -p 26379 sentinel get-master-addr-by-name mymaster
# 1) "10.100.100.103" (redis-sv03 — promoted)
# 2) "6379"
The canary key is present on the new primary, and it accepts writes:
redis-cli -h 10.100.100.103 -a ChangeMe_RedisPass --no-auth-warning get ha:canary
# -> "before-failover"
redis-cli -h 10.100.100.103 -a ChangeMe_RedisPass --no-auth-warning set ha:after written-to-new-primary
# -> OK
Check the new primary's replication view:
redis-cli -h 10.100.100.103 -a ChangeMe_RedisPass --no-auth-warning info replication
role:master
connected_slaves:1
slave0:ip=10.100.100.102,port=6379,state=online,offset=20838,lag=0
Any client pointed at the Sentinels the whole time followed the switch automatically.
Recovery: the old primary rejoins
Start redis-sv01 again. It boots believing it is a master — but the Sentinels immediately reconfigure it: they rewrite its replicaof to point at the new primary, and it rejoins as a replica:
redis-cli -h 10.100.100.101 -a ChangeMe_RedisPass --no-auth-warning info replication
role:slave
master_host:10.100.100.103
master_link_status:up
Three members again — one primary, two replicas — and the row written during the outage is present everywhere. That is a self-healing Redis cluster: automatic failover and automatic recovery, with the client none the wiser.
Optional: a planned switchover
There is no first-class "switchover" verb like some databases have, but you can trigger a controlled failover for maintenance — it promotes a replica immediately, without waiting for a timeout:
redis-cli -p 26379 sentinel failover mymaster
Use it to move the primary role off a node you need to patch or reboot.
No comments to display
No comments to display