Files
home-server/talos/runbooks/etcd-talos.md

17 lines
856 B
Markdown

# Runbook: etcd / Talos control plane
**Symptoms:** API flaps, `etcd` alarms, multiple control planes `NotReady`, upgrades stuck.
**Checks**
1. `talosctl health` and `talosctl etcd status` (with `TALOSCONFIG`; target a control-plane node if needed).
2. `kubectl get nodes` — control planes **Ready**; look for disk/memory pressure.
3. Talos version skew: `talosctl version` vs node image in [`talos/talconfig.yaml`](../talconfig.yaml) / Image Factory schematic.
**Common fixes**
- One bad control plane: cordon/drain workloads only after confirming quorum; follow Talos maintenance docs for replace/remove.
- Disk full on etcd volume: resolve host disk / system partition (Talos ephemeral vs user volumes per machine config).
**References:** [Talos etcd](https://www.talos.dev/latest/advanced/etcd-maintenance/), [`talos/README.md`](../README.md).