diff --git a/ansible/README.md b/ansible/README.md index 3fe1a0f..de4078e 100644 --- a/ansible/README.md +++ b/ansible/README.md @@ -78,7 +78,7 @@ ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true |------|----------| | `talos_phase_a` | Talos genconfig, apply-config, bootstrap, kubeconfig | | `helm_repos` | `helm repo add` / `update` | -| `noble_*` | Cilium, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional) | +| `noble_*` | Cilium, CSI Volume Snapshot CRDs + controller, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional) | | `noble_landing_urls` | Writes **`ansible/output/noble-lab-ui-urls.md`** — URLs, service names, and (optional) Argo/Grafana passwords from Secrets | | `noble_post_deploy` | Post-install reminders | | `talos_bootstrap` | Genconfig-only (used by older playbook) | diff --git a/ansible/playbooks/noble.yml b/ansible/playbooks/noble.yml index 78c00c5..096f2a2 100644 --- a/ansible/playbooks/noble.yml +++ b/ansible/playbooks/noble.yml @@ -3,7 +3,7 @@ # Do not run until `kubectl get --raw /healthz` returns ok (see talos/README.md §3, CLUSTER-BUILD Phase A). # Run from repo **ansible/** directory: ansible-playbook playbooks/noble.yml # -# Tags: repos, cilium, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt, +# Tags: repos, cilium, csi_snapshot, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt, # argocd, kyverno, kyverno_policies, platform, velero, all (default) - name: Noble cluster — platform stack (Ansible-managed) hosts: localhost @@ -202,6 +202,8 @@ tags: [repos, helm] - role: noble_cilium tags: [cilium, cni] + - role: noble_csi_snapshot_controller + tags: [csi_snapshot, snapshot, storage] - role: noble_metrics_server tags: [metrics, metrics_server] - role: noble_longhorn diff --git a/ansible/roles/noble_csi_snapshot_controller/defaults/main.yml b/ansible/roles/noble_csi_snapshot_controller/defaults/main.yml new file mode 100644 index 0000000..2b8501c --- /dev/null +++ b/ansible/roles/noble_csi_snapshot_controller/defaults/main.yml @@ -0,0 +1,2 @@ +--- +noble_csi_snapshot_kubectl_timeout: 120s diff --git a/ansible/roles/noble_csi_snapshot_controller/tasks/main.yml b/ansible/roles/noble_csi_snapshot_controller/tasks/main.yml new file mode 100644 index 0000000..492efcd --- /dev/null +++ b/ansible/roles/noble_csi_snapshot_controller/tasks/main.yml @@ -0,0 +1,39 @@ +--- +# Volume Snapshot CRDs + snapshot-controller (Velero CSI / Longhorn snapshots). +- name: Apply Volume Snapshot CRDs (snapshot.storage.k8s.io) + ansible.builtin.command: + argv: + - kubectl + - apply + - "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}" + - -k + - "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/crd" + environment: + KUBECONFIG: "{{ noble_kubeconfig }}" + changed_when: true + +- name: Apply snapshot-controller in kube-system + ansible.builtin.command: + argv: + - kubectl + - apply + - "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}" + - -k + - "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/controller" + environment: + KUBECONFIG: "{{ noble_kubeconfig }}" + changed_when: true + +- name: Wait for snapshot-controller Deployment + ansible.builtin.command: + argv: + - kubectl + - -n + - kube-system + - rollout + - status + - deploy/snapshot-controller + - --timeout=120s + environment: + KUBECONFIG: "{{ noble_kubeconfig }}" + changed_when: false diff --git a/ansible/roles/noble_landing_urls/defaults/main.yml b/ansible/roles/noble_landing_urls/defaults/main.yml index 76c49a9..313798d 100644 --- a/ansible/roles/noble_landing_urls/defaults/main.yml +++ b/ansible/roles/noble_landing_urls/defaults/main.yml @@ -44,6 +44,11 @@ noble_lab_ui_entries: namespace: vault service: vault url: https://vault.apps.noble.lab.pcenicni.dev + - name: Velero + description: Cluster backups — no web UI (velero CLI / kubectl CRDs) + namespace: velero + service: velero + url: "" - name: Homepage description: App dashboard (links to lab UIs) namespace: homepage diff --git a/ansible/roles/noble_landing_urls/templates/noble-lab-ui-urls.md.j2 b/ansible/roles/noble_landing_urls/templates/noble-lab-ui-urls.md.j2 index 35ebdb4..78cd42c 100644 --- a/ansible/roles/noble_landing_urls/templates/noble-lab-ui-urls.md.j2 +++ b/ansible/roles/noble_landing_urls/templates/noble-lab-ui-urls.md.j2 @@ -11,7 +11,7 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t | UI | What | Kubernetes service | Namespace | URL | |----|------|----------------------|-----------|-----| {% for e in noble_lab_ui_entries %} -| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | [{{ e.url }}]({{ e.url }}) | +| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | {% if e.url | default('') | length > 0 %}[{{ e.url }}]({{ e.url }}){% else %}—{% endif %} | {% endfor %} ## Initial access (logins) @@ -49,3 +49,4 @@ To generate this file **without** calling kubectl, run Ansible with **`-e noble_ - **Vault** UI needs **unsealed** Vault; tokens come from your chosen auth method. - **Prometheus / Alertmanager** UIs are unauthenticated by default — restrict when hardening (`talos/CLUSTER-BUILD.md` Phase G). - **Headlamp** token above expires after the configured duration; re-run Ansible or `kubectl create token` to refresh. +- **Velero** has **no web UI** — use **`velero`** CLI or **`kubectl -n velero get backup,schedule,backupstoragelocation`**. Metrics: **`velero`** Service in **`velero`** (Prometheus scrape). See `clusters/noble/bootstrap/velero/README.md`. diff --git a/clusters/noble/apps/homepage/values.yaml b/clusters/noble/apps/homepage/values.yaml index 58698c9..8014409 100644 --- a/clusters/noble/apps/homepage/values.yaml +++ b/clusters/noble/apps/homepage/values.yaml @@ -1,6 +1,7 @@ # Homepage — [gethomepage/homepage](https://github.com/gethomepage/homepage) via [jameswynn/homepage](https://github.com/jameswynn/helm-charts) Helm chart. # Ingress: Traefik + cert-manager (same pattern as `clusters/noble/bootstrap/headlamp/values.yaml`). # Service links match **`ansible/roles/noble_landing_urls/defaults/main.yml`** (`noble_lab_ui_entries`). +# **Velero** has no in-cluster web UI — tile links to upstream docs (no **siteMonitor**). # # **`siteMonitor`** runs **server-side** in the Homepage pod (see `gethomepage/homepage` `siteMonitor.js`). # Public FQDNs like **`*.apps.noble.lab.pcenicni.dev`** often do **not** resolve inside the cluster @@ -84,6 +85,10 @@ config: # Unauthenticated health (HEAD/GET) — not the redirecting UI root siteMonitor: http://vault.vault.svc.cluster.local:8200/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204 description: Secrets engine UI (after init/unseal) + - Velero: + icon: mdi-backup-restore + href: https://velero.io/docs/ + description: Cluster backups — no in-cluster web UI; use velero CLI or kubectl (docs) widgets: - datetime: text_size: xl diff --git a/clusters/noble/bootstrap/csi-snapshot-controller/README.md b/clusters/noble/bootstrap/csi-snapshot-controller/README.md new file mode 100644 index 0000000..81a1b1c --- /dev/null +++ b/clusters/noble/bootstrap/csi-snapshot-controller/README.md @@ -0,0 +1,16 @@ +# CSI Volume Snapshot (external-snapshotter) + +Installs the **Volume Snapshot** CRDs and the **snapshot-controller** so CSI drivers (e.g. **Longhorn**) and **Velero** can use `VolumeSnapshot` / `VolumeSnapshotContent` / `VolumeSnapshotClass`. + +- Upstream: [kubernetes-csi/external-snapshotter](https://github.com/kubernetes-csi/external-snapshotter) **v8.5.0** +- **Not** the per-driver **csi-snapshotter** sidecar — Longhorn ships that with its CSI components. + +**Order:** apply **before** relying on volume snapshots (e.g. before or early with **Longhorn**; **Ansible** runs this after **Cilium**, before **metrics-server** / **Longhorn**). + +```bash +kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/crd +kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/controller +kubectl -n kube-system rollout status deploy/snapshot-controller --timeout=120s +``` + +After this, create or label a **VolumeSnapshotClass** for Longhorn (`velero.io/csi-volumesnapshot-class: "true"`) per `clusters/noble/bootstrap/velero/README.md`. diff --git a/clusters/noble/bootstrap/csi-snapshot-controller/controller/kustomization.yaml b/clusters/noble/bootstrap/csi-snapshot-controller/controller/kustomization.yaml new file mode 100644 index 0000000..230d1b8 --- /dev/null +++ b/clusters/noble/bootstrap/csi-snapshot-controller/controller/kustomization.yaml @@ -0,0 +1,8 @@ +# Snapshot controller — **kube-system** (upstream default). +# Image tag should match the external-snapshotter release family (see setup-snapshot-controller.yaml in that tag). +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +namespace: kube-system +resources: + - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml + - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml diff --git a/clusters/noble/bootstrap/csi-snapshot-controller/crd/kustomization.yaml b/clusters/noble/bootstrap/csi-snapshot-controller/crd/kustomization.yaml new file mode 100644 index 0000000..1445695 --- /dev/null +++ b/clusters/noble/bootstrap/csi-snapshot-controller/crd/kustomization.yaml @@ -0,0 +1,9 @@ +# kubernetes-csi/external-snapshotter — Volume Snapshot GA CRDs only (no VolumeGroupSnapshot). +# Pin **ref** when bumping; keep in sync with **controller** image below. +# https://github.com/kubernetes-csi/external-snapshotter/tree/v8.5.0/client/config/crd +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +resources: + - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml + - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml + - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml diff --git a/clusters/noble/bootstrap/kustomization.yaml b/clusters/noble/bootstrap/kustomization.yaml index 913b61a..0882590 100644 --- a/clusters/noble/bootstrap/kustomization.yaml +++ b/clusters/noble/bootstrap/kustomization.yaml @@ -13,6 +13,7 @@ resources: - vault/namespace.yaml - kyverno/namespace.yaml - velero/namespace.yaml + - velero/longhorn-volumesnapshotclass.yaml - headlamp/namespace.yaml - grafana-loki-datasource/loki-datasource.yaml - vault/unseal-cronjob.yaml diff --git a/clusters/noble/bootstrap/velero/README.md b/clusters/noble/bootstrap/velero/README.md index d2507ab..c7a82e0 100644 --- a/clusters/noble/bootstrap/velero/README.md +++ b/clusters/noble/bootstrap/velero/README.md @@ -4,25 +4,20 @@ Ansible-managed core stack — **not** reconciled by Argo CD (`clusters/noble/ap ## What you get +- **No web UI** — Velero is operated with the **`velero`** CLI and **`kubectl`** (Backup, Schedule, Restore CRDs). Metrics are exposed for Prometheus; there is no first-party dashboard in this chart. - **vmware-tanzu/velero** Helm chart (**12.0.0** → Velero **1.18.0**) in namespace **`velero`** - **AWS plugin** init container for **S3-compatible** object storage (`velero/velero-plugin-for-aws:v1.14.0`) - **CSI snapshots** via Velero’s built-in CSI support (`EnableCSI`) and **VolumeSnapshotLocation** `velero.io/csi` (no separate CSI plugin image for Velero ≥ 1.14) - **Prometheus** scraping: **ServiceMonitor** labeled for **kube-prometheus** (`release: kube-prometheus`) +- **Schedule** **`velero-daily-noble`**: cron **`0 3 * * *`** (daily at 03:00 in the Velero pod’s timezone, usually **UTC**), **720h** TTL per backup (~30 days). Edit **`values.yaml`** `schedules` to change time or retention. ## Prerequisites -1. **Longhorn** (or another CSI driver) with a **VolumeSnapshotClass** for that driver. -2. For **Velero** to pick a default snapshot class, **one** `VolumeSnapshotClass` per driver should carry: +1. **Volume Snapshot APIs** installed cluster-wide — **`clusters/noble/bootstrap/csi-snapshot-controller/`** (Ansible **`noble_csi_snapshot_controller`**, after **Cilium**). Without **`snapshot.storage.k8s.io`** CRDs and **`kube-system/snapshot-controller`**, Velero logs errors like `no matches for kind "VolumeSnapshot"`. +2. **Longhorn** (or another CSI driver) with a **VolumeSnapshotClass** for that driver. +3. For **Longhorn**, this repo applies **`velero/longhorn-volumesnapshotclass.yaml`** (`VolumeSnapshotClass` **`longhorn-velero`**, driver **`driver.longhorn.io`**, Velero label). It is included in **`clusters/noble/bootstrap/kustomization.yaml`** (same apply as other bootstrap YAML). For non-Longhorn drivers, add a class with **`velero.io/csi-volumesnapshot-class: "true"`** (see [Velero CSI](https://velero.io/docs/main/csi/)). - ```yaml - metadata: - labels: - velero.io/csi-volumesnapshot-class: "true" - ``` - - Example for Longhorn: after install, confirm the driver name (often `driver.longhorn.io`) and either label Longhorn’s `VolumeSnapshotClass` or create one and label it (see [Velero CSI](https://velero.io/docs/main/csi/)). - -3. **S3-compatible** endpoint (MinIO, VersityGW, AWS, etc.) and a **bucket**. +4. **S3-compatible** endpoint (MinIO, VersityGW, AWS, etc.) and a **bucket**. ## Credentials Secret diff --git a/clusters/noble/bootstrap/velero/longhorn-volumesnapshotclass.yaml b/clusters/noble/bootstrap/velero/longhorn-volumesnapshotclass.yaml new file mode 100644 index 0000000..1feca26 --- /dev/null +++ b/clusters/noble/bootstrap/velero/longhorn-volumesnapshotclass.yaml @@ -0,0 +1,11 @@ +# Default Longhorn VolumeSnapshotClass for Velero CSI — one class per driver may carry +# **velero.io/csi-volumesnapshot-class: "true"** (see velero/README.md). +# Apply after **Longhorn** CSI is running (`driver.longhorn.io`). +apiVersion: snapshot.storage.k8s.io/v1 +kind: VolumeSnapshotClass +metadata: + name: longhorn-velero + labels: + velero.io/csi-volumesnapshot-class: "true" +driver: driver.longhorn.io +deletionPolicy: Delete diff --git a/clusters/noble/bootstrap/velero/values.yaml b/clusters/noble/bootstrap/velero/values.yaml index 9bd9249..401c2e5 100644 --- a/clusters/noble/bootstrap/velero/values.yaml +++ b/clusters/noble/bootstrap/velero/values.yaml @@ -54,4 +54,12 @@ metrics: additionalLabels: release: kube-prometheus -schedules: {} +# Daily full-cluster backup at 03:00 — cron is evaluated in the Velero pod (typically **UTC**; set TZ on the +# Deployment if you need local wall clock). See `helm upgrade --install` to apply. +schedules: + daily-noble: + disabled: false + schedule: "0 3 * * *" + template: + ttl: 720h + storageLocation: default diff --git a/talos/CLUSTER-BUILD.md b/talos/CLUSTER-BUILD.md index 6dc92c3..ff5f5b2 100644 --- a/talos/CLUSTER-BUILD.md +++ b/talos/CLUSTER-BUILD.md @@ -8,6 +8,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul - **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6` - **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/bootstrap/cilium/`, phase 1 values). +- **CSI Volume Snapshot** — **external-snapshotter** **v8.5.0** CRDs + **`registry.k8s.io/sig-storage/snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**). - **MetalLB** Helm **0.15.3** / app **v0.15.3**; **IPAddressPool** `noble-l2` + **L2Advertisement** — pool **`192.168.50.210`–`192.168.50.229`**. - **kube-vip** DaemonSet **3/3** on control planes; VIP **`192.168.50.230`** on **`ens18`** (`vip_subnet` **`/32`** required — bare **`32`** breaks parsing). **Verified from workstation:** `kubectl config set-cluster noble --server=https://192.168.50.230:6443` then **`kubectl get --raw /healthz`** → **`ok`** (`talos/kubeconfig`; see `talos/README.md`). - **metrics-server** Helm **3.13.0** / app **v0.8.0** — `clusters/noble/bootstrap/metrics-server/values.yaml` (`--kubelet-insecure-tls` for Talos); **`kubectl top nodes`** works. @@ -83,6 +84,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul | Longhorn GPT wipe automation | `talos/scripts/longhorn-gpt-recovery.sh` | | kube-vip (kustomize) | `clusters/noble/bootstrap/kube-vip/` (`vip_interface` e.g. `ens18`) | | Cilium (Helm values) | `clusters/noble/bootstrap/cilium/` — `values.yaml` (phase 1), optional `values-kpr.yaml`, `README.md` | +| CSI Volume Snapshot (CRDs + controller) | `clusters/noble/bootstrap/csi-snapshot-controller/` — `crd/`, `controller/` kustomize; **`ansible/roles/noble_csi_snapshot_controller`** | | MetalLB | `clusters/noble/bootstrap/metallb/` — `namespace.yaml` (PSA **privileged**), `ip-address-pool.yaml`, `kustomization.yaml`, `README.md` | | Longhorn | `clusters/noble/bootstrap/longhorn/` — `values.yaml`, `namespace.yaml` (PSA **privileged**), `kustomization.yaml` | | metrics-server (Helm values) | `clusters/noble/bootstrap/metrics-server/values.yaml` | @@ -109,12 +111,13 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul 1. **Talos** installed; **Cilium** (or chosen CNI) **before** most workloads — with `cni: none`, nodes stay **NotReady** / **network-unavailable** taint until CNI is up. 2. **MetalLB Helm chart** (CRDs + controller) **before** `kubectl apply -k` on the pool manifests. 3. **`clusters/noble/bootstrap/metallb/namespace.yaml`** before or merged onto `metallb-system` so Pod Security does not block speaker (see `bootstrap/metallb/README.md`). -4. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`. -5. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki). -6. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured. -7. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**). -8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself. -9. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn). +4. **CSI Volume snapshots:** **`kubernetes-csi/external-snapshotter`** CRDs + **`snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`) before relying on **Longhorn** / **Velero** volume snapshots. +5. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`. +6. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki). +7. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured. +8. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**). +9. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself. +10. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn). ## Prerequisites (before phases) @@ -140,9 +143,10 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul ## Phase B — Core platform -**Install order:** **Cilium** → **metrics-server** → **Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned. +**Install order:** **Cilium** → **Volume Snapshot CRDs + snapshot-controller** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**) → **metrics-server** → **Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned. - [x] **Cilium** (Helm **1.16.6**) — **required** before MetalLB if `cni: none` (`clusters/noble/bootstrap/cilium/`) +- [x] **CSI Volume Snapshot** — CRDs + **`snapshot-controller`** in **`kube-system`** (`clusters/noble/bootstrap/csi-snapshot-controller/`); Ansible **`noble_csi_snapshot_controller`**; verify `kubectl api-resources | grep VolumeSnapshot` - [x] **metrics-server** — Helm **3.13.0**; values in `clusters/noble/bootstrap/metrics-server/values.yaml`; verify `kubectl top nodes` - [x] **Longhorn** — Talos: user volume + kubelet mounts + extensions (`talos/README.md` §5); Helm **1.11.1**; `kubectl apply -k clusters/noble/bootstrap/longhorn`; verify **`nodes.longhorn.io`** and test PVC **`Bound`** - [x] **MetalLB** — chart installed; **pool + L2** from `clusters/noble/bootstrap/metallb/` applied (`192.168.50.210`–`229`)