19 lines
1.3 KiB
Markdown
19 lines
1.3 KiB
Markdown
# Runbook: Longhorn
|
|
|
|
**Symptoms:** PVCs stuck **Pending**, volumes **Faulted**, workloads I/O errors, Longhorn UI/alerts.
|
|
|
|
**Checks**
|
|
|
|
1. `kubectl -n longhorn-system get pods` and `kubectl get nodes.longhorn.io -o wide`.
|
|
2. Talos user disk + extensions for Longhorn (see [`talos/README.md`](../README.md) section 5 and `talconfig.with-longhorn.yaml`).
|
|
3. `kubectl get sc` — **longhorn** default as expected; PVC events: `kubectl describe pvc -n <ns> <name>`.
|
|
|
|
**Common fixes**
|
|
|
|
- Node disk pressure / mount missing: fix Talos machine config, reboot node per Talos docs.
|
|
- Recovery / GPT wipe scripts: [`talos/scripts/longhorn-gpt-recovery.sh`](../scripts/longhorn-gpt-recovery.sh) and CLUSTER-BUILD notes.
|
|
|
|
**Security / compliance (Trivy KSV on `longhorn-role`):** Upstream Longhorn RBAC is expected to fail strict built-in checks; we accept that for a storage controller and mitigate with PSA on the namespace, OIDC/ForwardAuth for the UI, network policy where you add it, and tight control over support-bundle use. See [`clusters/noble/bootstrap/longhorn/README.md`](../../clusters/noble/bootstrap/longhorn/README.md).
|
|
|
|
**References:** [`clusters/noble/bootstrap/longhorn/`](../../clusters/noble/bootstrap/longhorn/), [`clusters/noble/bootstrap/longhorn/README.md`](../../clusters/noble/bootstrap/longhorn/README.md) (RBAC posture), [Longhorn docs](https://longhorn.io/docs/).
|