Files
home-server/talos/runbooks/longhorn.md

1.3 KiB

Runbook: Longhorn

Symptoms: PVCs stuck Pending, volumes Faulted, workloads I/O errors, Longhorn UI/alerts.

Checks

  1. kubectl -n longhorn-system get pods and kubectl get nodes.longhorn.io -o wide.
  2. Talos user disk + extensions for Longhorn (see talos/README.md section 5 and talconfig.with-longhorn.yaml).
  3. kubectl get sclonghorn default as expected; PVC events: kubectl describe pvc -n <ns> <name>.

Common fixes

Security / compliance (Trivy KSV on longhorn-role): Upstream Longhorn RBAC is expected to fail strict built-in checks; we accept that for a storage controller and mitigate with PSA on the namespace, OIDC/ForwardAuth for the UI, network policy where you add it, and tight control over support-bundle use. See clusters/noble/bootstrap/longhorn/README.md.

References: clusters/noble/bootstrap/longhorn/, clusters/noble/bootstrap/longhorn/README.md (RBAC posture), Longhorn docs.