From 8e42777a1d5b2f92af9d6c95b33ad835d7c9bd09 Mon Sep 17 00:00:00 2001 From: Nikholas Pcenicni <82239765+nikpcenicni@users.noreply.github.com> Date: Thu, 14 May 2026 17:36:18 -0400 Subject: [PATCH] Update Longhorn runbook documentation for clarity and compliance. Adjusted section references for consistency and added details on security and compliance measures regarding RBAC and namespace management. --- clusters/noble/bootstrap/longhorn/README.md | 20 ++++++++++++++++++++ talos/runbooks/longhorn.md | 6 ++++-- 2 files changed, 24 insertions(+), 2 deletions(-) create mode 100644 clusters/noble/bootstrap/longhorn/README.md diff --git a/clusters/noble/bootstrap/longhorn/README.md b/clusters/noble/bootstrap/longhorn/README.md new file mode 100644 index 0000000..0c1e2c8 --- /dev/null +++ b/clusters/noble/bootstrap/longhorn/README.md @@ -0,0 +1,20 @@ +# Longhorn on noble — install notes + +Helm values, namespace PSA, and (when Authentik is enabled) ForwardAuth overlays live in this directory. Install flow is covered in [`ansible/roles/noble_longhorn`](../../../../ansible/roles/noble_longhorn/) and [`talos/runbooks/longhorn.md`](../../../../talos/runbooks/longhorn.md). + +## RBAC, Trivy KSV, and accepted risk + +The upstream Longhorn chart ships a **`longhorn-role` ClusterRole** with broad permissions: wildcard verbs on several API groups, **list/watch on Secrets** (policy tools treat cluster-scoped secret reads as high risk), **create/patch/delete** on mutating/validating **WebhookConfiguration** objects, and **delete/deletecollection** on **Pods**. Trivy’s built-in Kubernetes checks (for example **AVD-KSV-0041**, **0045**, **0048**, **0114**) flag that role. **This is expected** for a storage controller that installs CRDs, runs CSI-style components, and manages workload pods; shrinking that role without upstream support is likely to **break Longhorn**. + +The chart also includes a **support-bundle** flow that binds a dedicated service account to **`cluster-admin`**. Treat that as **high privilege**: limit who can create or use support-bundle workloads in **`longhorn-system`**, and disable or avoid the feature if you do not need vendor diagnostics. + +### Mitigations we rely on instead of forking RBAC + +| Area | What we do | +| --- | --- | +| **Pod Security Admission** | **`longhorn-system`** is labeled **privileged** in [`namespace.yaml`](./namespace.yaml) because Longhorn requires hostPath and privileged pods; other namespaces stay on stricter defaults where configured. | +| **UI access** | Longhorn UI is exposed through **Traefik** with **oauth2-proxy** ForwardAuth to **Authentik** when the Authentik role is applied (see [`values-authentik-forwardauth.yaml`](./values-authentik-forwardauth.yaml) and [`ansible/roles/noble_authentik/README.md`](../../../../ansible/roles/noble_authentik/README.md)). | +| **Network segmentation** | Cluster CNI is **Cilium**. Add **NetworkPolicy** (or Cilium **CiliumNetworkPolicy**) for **`longhorn-system`** and workloads that talk to the Longhorn API if you need tighter east-west boundaries; this repo does not ship a default deny for Longhorn. | +| **Support bundles** | Restrict **`longhorn-system`** RBAC (who can create Jobs/Pods, impersonate, or exec) and Longhorn UI/API access so only trusted operators can trigger vendor support tooling. | + +**Trivy Operator:** workload scans skip **`longhorn-system`** via **`excludeNamespaces`** in [`clusters/noble/apps/trivy/values.yaml`](../../apps/trivy/values.yaml). **ClusterRole** config audits are cluster-scoped, so findings on **`longhorn-role`** can still appear; treat them as **documented vendor baseline** unless you narrow operator config (for example dropping **ClusterRole** from config-audit kinds), which affects the whole cluster, not only Longhorn. diff --git a/talos/runbooks/longhorn.md b/talos/runbooks/longhorn.md index 9c7793e..3a1ce90 100644 --- a/talos/runbooks/longhorn.md +++ b/talos/runbooks/longhorn.md @@ -5,7 +5,7 @@ **Checks** 1. `kubectl -n longhorn-system get pods` and `kubectl get nodes.longhorn.io -o wide`. -2. Talos user disk + extensions for Longhorn (see [`talos/README.md`](../README.md) §5 and `talconfig.with-longhorn.yaml`). +2. Talos user disk + extensions for Longhorn (see [`talos/README.md`](../README.md) section 5 and `talconfig.with-longhorn.yaml`). 3. `kubectl get sc` — **longhorn** default as expected; PVC events: `kubectl describe pvc -n `. **Common fixes** @@ -13,4 +13,6 @@ - Node disk pressure / mount missing: fix Talos machine config, reboot node per Talos docs. - Recovery / GPT wipe scripts: [`talos/scripts/longhorn-gpt-recovery.sh`](../scripts/longhorn-gpt-recovery.sh) and CLUSTER-BUILD notes. -**References:** [`clusters/noble/bootstrap/longhorn/`](../../clusters/noble/bootstrap/longhorn/), [Longhorn docs](https://longhorn.io/docs/). +**Security / compliance (Trivy KSV on `longhorn-role`):** Upstream Longhorn RBAC is expected to fail strict built-in checks; we accept that for a storage controller and mitigate with PSA on the namespace, OIDC/ForwardAuth for the UI, network policy where you add it, and tight control over support-bundle use. See [`clusters/noble/bootstrap/longhorn/README.md`](../../clusters/noble/bootstrap/longhorn/README.md). + +**References:** [`clusters/noble/bootstrap/longhorn/`](../../clusters/noble/bootstrap/longhorn/), [`clusters/noble/bootstrap/longhorn/README.md`](../../clusters/noble/bootstrap/longhorn/README.md) (RBAC posture), [Longhorn docs](https://longhorn.io/docs/).