Restoring etcd Consensus
If etcd
loses consensus, stop k3s on all master nodes:
systemctl stop k3s.service
To clean up processes that might hang around, restart the nodes with k3s disabled:
systemctl disable --now k3s.service
systemctl reboot
On one node, run:
k3s server --cluster-reset
Follow the directions. There may be configuration changes needed (for example, removing the server
key from /etc/rancher/k3s/config.yaml
).
On the same node, restart and enable k3s:
systemctl enable --now k3s.service
Check for errors and address if any.
On the remaining master nodes, backup and delete the etcd data directory, adjust /etc/rancher/k3s/config.yaml
’s server
key if necessary, then enable and start k3s:
rsync -avz /var/lib/rancher/k3s/server/db/ /root/db-backup/
rm -rf /var/lib/rancher/k3s/server/db/
systemctl enable --now k3s.service
Check to make sure the node properly joins the cluster as a master.
Change your shirt and underwear before going out in public.