Update Headlamp and Vault documentation; enhance RBAC configurations in Argo CD. Modify Headlamp README to clarify sessionTTL handling and ServiceAccount permissions. Add Cilium network policy instructions to Vault README. Update Argo CD values.yaml for default RBAC settings, ensuring local admin retains full access while new users start with read-only permissions. Reflect these changes in CLUSTER-BUILD.md.

This commit is contained in:
Nikholas Pcenicni
2026-03-28 02:02:17 -04:00
parent 906c24b1d5
commit 445a1ac211
15 changed files with 188 additions and 13 deletions

View File

@@ -0,0 +1,16 @@
# Runbook: etcd / Talos control plane
**Symptoms:** API flaps, `etcd` alarms, multiple control planes `NotReady`, upgrades stuck.
**Checks**
1. `talosctl health` and `talosctl etcd status` (with `TALOSCONFIG`; target a control-plane node if needed).
2. `kubectl get nodes` — control planes **Ready**; look for disk/memory pressure.
3. Talos version skew: `talosctl version` vs node image in [`talos/talconfig.yaml`](../talconfig.yaml) / Image Factory schematic.
**Common fixes**
- One bad control plane: cordon/drain workloads only after confirming quorum; follow Talos maintenance docs for replace/remove.
- Disk full on etcd volume: resolve host disk / system partition (Talos ephemeral vs user volumes per machine config).
**References:** [Talos etcd](https://www.talos.dev/latest/advanced/etcd-maintenance/), [`talos/README.md`](../README.md).