Enhance documentation and configuration for Velero integration. Update README.md to clarify Velero's lack of web UI and usage instructions for CLI. Add CSI Volume Snapshot support in playbooks and roles, and include Velero service details in noble_landing_urls. Adjust kustomization.yaml to include VolumeSnapshotClass configuration, ensuring proper setup for backups. Improve overall clarity in related documentation.
This commit is contained in:
@@ -78,7 +78,7 @@ ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true
|
|||||||
|------|----------|
|
|------|----------|
|
||||||
| `talos_phase_a` | Talos genconfig, apply-config, bootstrap, kubeconfig |
|
| `talos_phase_a` | Talos genconfig, apply-config, bootstrap, kubeconfig |
|
||||||
| `helm_repos` | `helm repo add` / `update` |
|
| `helm_repos` | `helm repo add` / `update` |
|
||||||
| `noble_*` | Cilium, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional) |
|
| `noble_*` | Cilium, CSI Volume Snapshot CRDs + controller, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional) |
|
||||||
| `noble_landing_urls` | Writes **`ansible/output/noble-lab-ui-urls.md`** — URLs, service names, and (optional) Argo/Grafana passwords from Secrets |
|
| `noble_landing_urls` | Writes **`ansible/output/noble-lab-ui-urls.md`** — URLs, service names, and (optional) Argo/Grafana passwords from Secrets |
|
||||||
| `noble_post_deploy` | Post-install reminders |
|
| `noble_post_deploy` | Post-install reminders |
|
||||||
| `talos_bootstrap` | Genconfig-only (used by older playbook) |
|
| `talos_bootstrap` | Genconfig-only (used by older playbook) |
|
||||||
|
|||||||
@@ -3,7 +3,7 @@
|
|||||||
# Do not run until `kubectl get --raw /healthz` returns ok (see talos/README.md §3, CLUSTER-BUILD Phase A).
|
# Do not run until `kubectl get --raw /healthz` returns ok (see talos/README.md §3, CLUSTER-BUILD Phase A).
|
||||||
# Run from repo **ansible/** directory: ansible-playbook playbooks/noble.yml
|
# Run from repo **ansible/** directory: ansible-playbook playbooks/noble.yml
|
||||||
#
|
#
|
||||||
# Tags: repos, cilium, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt,
|
# Tags: repos, cilium, csi_snapshot, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt,
|
||||||
# argocd, kyverno, kyverno_policies, platform, velero, all (default)
|
# argocd, kyverno, kyverno_policies, platform, velero, all (default)
|
||||||
- name: Noble cluster — platform stack (Ansible-managed)
|
- name: Noble cluster — platform stack (Ansible-managed)
|
||||||
hosts: localhost
|
hosts: localhost
|
||||||
@@ -202,6 +202,8 @@
|
|||||||
tags: [repos, helm]
|
tags: [repos, helm]
|
||||||
- role: noble_cilium
|
- role: noble_cilium
|
||||||
tags: [cilium, cni]
|
tags: [cilium, cni]
|
||||||
|
- role: noble_csi_snapshot_controller
|
||||||
|
tags: [csi_snapshot, snapshot, storage]
|
||||||
- role: noble_metrics_server
|
- role: noble_metrics_server
|
||||||
tags: [metrics, metrics_server]
|
tags: [metrics, metrics_server]
|
||||||
- role: noble_longhorn
|
- role: noble_longhorn
|
||||||
|
|||||||
@@ -0,0 +1,2 @@
|
|||||||
|
---
|
||||||
|
noble_csi_snapshot_kubectl_timeout: 120s
|
||||||
39
ansible/roles/noble_csi_snapshot_controller/tasks/main.yml
Normal file
39
ansible/roles/noble_csi_snapshot_controller/tasks/main.yml
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
---
|
||||||
|
# Volume Snapshot CRDs + snapshot-controller (Velero CSI / Longhorn snapshots).
|
||||||
|
- name: Apply Volume Snapshot CRDs (snapshot.storage.k8s.io)
|
||||||
|
ansible.builtin.command:
|
||||||
|
argv:
|
||||||
|
- kubectl
|
||||||
|
- apply
|
||||||
|
- "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}"
|
||||||
|
- -k
|
||||||
|
- "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/crd"
|
||||||
|
environment:
|
||||||
|
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||||
|
changed_when: true
|
||||||
|
|
||||||
|
- name: Apply snapshot-controller in kube-system
|
||||||
|
ansible.builtin.command:
|
||||||
|
argv:
|
||||||
|
- kubectl
|
||||||
|
- apply
|
||||||
|
- "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}"
|
||||||
|
- -k
|
||||||
|
- "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/controller"
|
||||||
|
environment:
|
||||||
|
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||||
|
changed_when: true
|
||||||
|
|
||||||
|
- name: Wait for snapshot-controller Deployment
|
||||||
|
ansible.builtin.command:
|
||||||
|
argv:
|
||||||
|
- kubectl
|
||||||
|
- -n
|
||||||
|
- kube-system
|
||||||
|
- rollout
|
||||||
|
- status
|
||||||
|
- deploy/snapshot-controller
|
||||||
|
- --timeout=120s
|
||||||
|
environment:
|
||||||
|
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||||
|
changed_when: false
|
||||||
@@ -44,6 +44,11 @@ noble_lab_ui_entries:
|
|||||||
namespace: vault
|
namespace: vault
|
||||||
service: vault
|
service: vault
|
||||||
url: https://vault.apps.noble.lab.pcenicni.dev
|
url: https://vault.apps.noble.lab.pcenicni.dev
|
||||||
|
- name: Velero
|
||||||
|
description: Cluster backups — no web UI (velero CLI / kubectl CRDs)
|
||||||
|
namespace: velero
|
||||||
|
service: velero
|
||||||
|
url: ""
|
||||||
- name: Homepage
|
- name: Homepage
|
||||||
description: App dashboard (links to lab UIs)
|
description: App dashboard (links to lab UIs)
|
||||||
namespace: homepage
|
namespace: homepage
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t
|
|||||||
| UI | What | Kubernetes service | Namespace | URL |
|
| UI | What | Kubernetes service | Namespace | URL |
|
||||||
|----|------|----------------------|-----------|-----|
|
|----|------|----------------------|-----------|-----|
|
||||||
{% for e in noble_lab_ui_entries %}
|
{% for e in noble_lab_ui_entries %}
|
||||||
| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | [{{ e.url }}]({{ e.url }}) |
|
| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | {% if e.url | default('') | length > 0 %}[{{ e.url }}]({{ e.url }}){% else %}—{% endif %} |
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
|
|
||||||
## Initial access (logins)
|
## Initial access (logins)
|
||||||
@@ -49,3 +49,4 @@ To generate this file **without** calling kubectl, run Ansible with **`-e noble_
|
|||||||
- **Vault** UI needs **unsealed** Vault; tokens come from your chosen auth method.
|
- **Vault** UI needs **unsealed** Vault; tokens come from your chosen auth method.
|
||||||
- **Prometheus / Alertmanager** UIs are unauthenticated by default — restrict when hardening (`talos/CLUSTER-BUILD.md` Phase G).
|
- **Prometheus / Alertmanager** UIs are unauthenticated by default — restrict when hardening (`talos/CLUSTER-BUILD.md` Phase G).
|
||||||
- **Headlamp** token above expires after the configured duration; re-run Ansible or `kubectl create token` to refresh.
|
- **Headlamp** token above expires after the configured duration; re-run Ansible or `kubectl create token` to refresh.
|
||||||
|
- **Velero** has **no web UI** — use **`velero`** CLI or **`kubectl -n velero get backup,schedule,backupstoragelocation`**. Metrics: **`velero`** Service in **`velero`** (Prometheus scrape). See `clusters/noble/bootstrap/velero/README.md`.
|
||||||
|
|||||||
@@ -1,6 +1,7 @@
|
|||||||
# Homepage — [gethomepage/homepage](https://github.com/gethomepage/homepage) via [jameswynn/homepage](https://github.com/jameswynn/helm-charts) Helm chart.
|
# Homepage — [gethomepage/homepage](https://github.com/gethomepage/homepage) via [jameswynn/homepage](https://github.com/jameswynn/helm-charts) Helm chart.
|
||||||
# Ingress: Traefik + cert-manager (same pattern as `clusters/noble/bootstrap/headlamp/values.yaml`).
|
# Ingress: Traefik + cert-manager (same pattern as `clusters/noble/bootstrap/headlamp/values.yaml`).
|
||||||
# Service links match **`ansible/roles/noble_landing_urls/defaults/main.yml`** (`noble_lab_ui_entries`).
|
# Service links match **`ansible/roles/noble_landing_urls/defaults/main.yml`** (`noble_lab_ui_entries`).
|
||||||
|
# **Velero** has no in-cluster web UI — tile links to upstream docs (no **siteMonitor**).
|
||||||
#
|
#
|
||||||
# **`siteMonitor`** runs **server-side** in the Homepage pod (see `gethomepage/homepage` `siteMonitor.js`).
|
# **`siteMonitor`** runs **server-side** in the Homepage pod (see `gethomepage/homepage` `siteMonitor.js`).
|
||||||
# Public FQDNs like **`*.apps.noble.lab.pcenicni.dev`** often do **not** resolve inside the cluster
|
# Public FQDNs like **`*.apps.noble.lab.pcenicni.dev`** often do **not** resolve inside the cluster
|
||||||
@@ -84,6 +85,10 @@ config:
|
|||||||
# Unauthenticated health (HEAD/GET) — not the redirecting UI root
|
# Unauthenticated health (HEAD/GET) — not the redirecting UI root
|
||||||
siteMonitor: http://vault.vault.svc.cluster.local:8200/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204
|
siteMonitor: http://vault.vault.svc.cluster.local:8200/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204
|
||||||
description: Secrets engine UI (after init/unseal)
|
description: Secrets engine UI (after init/unseal)
|
||||||
|
- Velero:
|
||||||
|
icon: mdi-backup-restore
|
||||||
|
href: https://velero.io/docs/
|
||||||
|
description: Cluster backups — no in-cluster web UI; use velero CLI or kubectl (docs)
|
||||||
widgets:
|
widgets:
|
||||||
- datetime:
|
- datetime:
|
||||||
text_size: xl
|
text_size: xl
|
||||||
|
|||||||
16
clusters/noble/bootstrap/csi-snapshot-controller/README.md
Normal file
16
clusters/noble/bootstrap/csi-snapshot-controller/README.md
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
# CSI Volume Snapshot (external-snapshotter)
|
||||||
|
|
||||||
|
Installs the **Volume Snapshot** CRDs and the **snapshot-controller** so CSI drivers (e.g. **Longhorn**) and **Velero** can use `VolumeSnapshot` / `VolumeSnapshotContent` / `VolumeSnapshotClass`.
|
||||||
|
|
||||||
|
- Upstream: [kubernetes-csi/external-snapshotter](https://github.com/kubernetes-csi/external-snapshotter) **v8.5.0**
|
||||||
|
- **Not** the per-driver **csi-snapshotter** sidecar — Longhorn ships that with its CSI components.
|
||||||
|
|
||||||
|
**Order:** apply **before** relying on volume snapshots (e.g. before or early with **Longhorn**; **Ansible** runs this after **Cilium**, before **metrics-server** / **Longhorn**).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/crd
|
||||||
|
kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/controller
|
||||||
|
kubectl -n kube-system rollout status deploy/snapshot-controller --timeout=120s
|
||||||
|
```
|
||||||
|
|
||||||
|
After this, create or label a **VolumeSnapshotClass** for Longhorn (`velero.io/csi-volumesnapshot-class: "true"`) per `clusters/noble/bootstrap/velero/README.md`.
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
# Snapshot controller — **kube-system** (upstream default).
|
||||||
|
# Image tag should match the external-snapshotter release family (see setup-snapshot-controller.yaml in that tag).
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
namespace: kube-system
|
||||||
|
resources:
|
||||||
|
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
|
||||||
|
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
# kubernetes-csi/external-snapshotter — Volume Snapshot GA CRDs only (no VolumeGroupSnapshot).
|
||||||
|
# Pin **ref** when bumping; keep in sync with **controller** image below.
|
||||||
|
# https://github.com/kubernetes-csi/external-snapshotter/tree/v8.5.0/client/config/crd
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
resources:
|
||||||
|
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
|
||||||
|
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
|
||||||
|
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
|
||||||
@@ -13,6 +13,7 @@ resources:
|
|||||||
- vault/namespace.yaml
|
- vault/namespace.yaml
|
||||||
- kyverno/namespace.yaml
|
- kyverno/namespace.yaml
|
||||||
- velero/namespace.yaml
|
- velero/namespace.yaml
|
||||||
|
- velero/longhorn-volumesnapshotclass.yaml
|
||||||
- headlamp/namespace.yaml
|
- headlamp/namespace.yaml
|
||||||
- grafana-loki-datasource/loki-datasource.yaml
|
- grafana-loki-datasource/loki-datasource.yaml
|
||||||
- vault/unseal-cronjob.yaml
|
- vault/unseal-cronjob.yaml
|
||||||
|
|||||||
@@ -4,25 +4,20 @@ Ansible-managed core stack — **not** reconciled by Argo CD (`clusters/noble/ap
|
|||||||
|
|
||||||
## What you get
|
## What you get
|
||||||
|
|
||||||
|
- **No web UI** — Velero is operated with the **`velero`** CLI and **`kubectl`** (Backup, Schedule, Restore CRDs). Metrics are exposed for Prometheus; there is no first-party dashboard in this chart.
|
||||||
- **vmware-tanzu/velero** Helm chart (**12.0.0** → Velero **1.18.0**) in namespace **`velero`**
|
- **vmware-tanzu/velero** Helm chart (**12.0.0** → Velero **1.18.0**) in namespace **`velero`**
|
||||||
- **AWS plugin** init container for **S3-compatible** object storage (`velero/velero-plugin-for-aws:v1.14.0`)
|
- **AWS plugin** init container for **S3-compatible** object storage (`velero/velero-plugin-for-aws:v1.14.0`)
|
||||||
- **CSI snapshots** via Velero’s built-in CSI support (`EnableCSI`) and **VolumeSnapshotLocation** `velero.io/csi` (no separate CSI plugin image for Velero ≥ 1.14)
|
- **CSI snapshots** via Velero’s built-in CSI support (`EnableCSI`) and **VolumeSnapshotLocation** `velero.io/csi` (no separate CSI plugin image for Velero ≥ 1.14)
|
||||||
- **Prometheus** scraping: **ServiceMonitor** labeled for **kube-prometheus** (`release: kube-prometheus`)
|
- **Prometheus** scraping: **ServiceMonitor** labeled for **kube-prometheus** (`release: kube-prometheus`)
|
||||||
|
- **Schedule** **`velero-daily-noble`**: cron **`0 3 * * *`** (daily at 03:00 in the Velero pod’s timezone, usually **UTC**), **720h** TTL per backup (~30 days). Edit **`values.yaml`** `schedules` to change time or retention.
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
1. **Longhorn** (or another CSI driver) with a **VolumeSnapshotClass** for that driver.
|
1. **Volume Snapshot APIs** installed cluster-wide — **`clusters/noble/bootstrap/csi-snapshot-controller/`** (Ansible **`noble_csi_snapshot_controller`**, after **Cilium**). Without **`snapshot.storage.k8s.io`** CRDs and **`kube-system/snapshot-controller`**, Velero logs errors like `no matches for kind "VolumeSnapshot"`.
|
||||||
2. For **Velero** to pick a default snapshot class, **one** `VolumeSnapshotClass` per driver should carry:
|
2. **Longhorn** (or another CSI driver) with a **VolumeSnapshotClass** for that driver.
|
||||||
|
3. For **Longhorn**, this repo applies **`velero/longhorn-volumesnapshotclass.yaml`** (`VolumeSnapshotClass` **`longhorn-velero`**, driver **`driver.longhorn.io`**, Velero label). It is included in **`clusters/noble/bootstrap/kustomization.yaml`** (same apply as other bootstrap YAML). For non-Longhorn drivers, add a class with **`velero.io/csi-volumesnapshot-class: "true"`** (see [Velero CSI](https://velero.io/docs/main/csi/)).
|
||||||
|
|
||||||
```yaml
|
4. **S3-compatible** endpoint (MinIO, VersityGW, AWS, etc.) and a **bucket**.
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
velero.io/csi-volumesnapshot-class: "true"
|
|
||||||
```
|
|
||||||
|
|
||||||
Example for Longhorn: after install, confirm the driver name (often `driver.longhorn.io`) and either label Longhorn’s `VolumeSnapshotClass` or create one and label it (see [Velero CSI](https://velero.io/docs/main/csi/)).
|
|
||||||
|
|
||||||
3. **S3-compatible** endpoint (MinIO, VersityGW, AWS, etc.) and a **bucket**.
|
|
||||||
|
|
||||||
## Credentials Secret
|
## Credentials Secret
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,11 @@
|
|||||||
|
# Default Longhorn VolumeSnapshotClass for Velero CSI — one class per driver may carry
|
||||||
|
# **velero.io/csi-volumesnapshot-class: "true"** (see velero/README.md).
|
||||||
|
# Apply after **Longhorn** CSI is running (`driver.longhorn.io`).
|
||||||
|
apiVersion: snapshot.storage.k8s.io/v1
|
||||||
|
kind: VolumeSnapshotClass
|
||||||
|
metadata:
|
||||||
|
name: longhorn-velero
|
||||||
|
labels:
|
||||||
|
velero.io/csi-volumesnapshot-class: "true"
|
||||||
|
driver: driver.longhorn.io
|
||||||
|
deletionPolicy: Delete
|
||||||
@@ -54,4 +54,12 @@ metrics:
|
|||||||
additionalLabels:
|
additionalLabels:
|
||||||
release: kube-prometheus
|
release: kube-prometheus
|
||||||
|
|
||||||
schedules: {}
|
# Daily full-cluster backup at 03:00 — cron is evaluated in the Velero pod (typically **UTC**; set TZ on the
|
||||||
|
# Deployment if you need local wall clock). See `helm upgrade --install` to apply.
|
||||||
|
schedules:
|
||||||
|
daily-noble:
|
||||||
|
disabled: false
|
||||||
|
schedule: "0 3 * * *"
|
||||||
|
template:
|
||||||
|
ttl: 720h
|
||||||
|
storageLocation: default
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
|||||||
|
|
||||||
- **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
|
- **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
|
||||||
- **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/bootstrap/cilium/`, phase 1 values).
|
- **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/bootstrap/cilium/`, phase 1 values).
|
||||||
|
- **CSI Volume Snapshot** — **external-snapshotter** **v8.5.0** CRDs + **`registry.k8s.io/sig-storage/snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**).
|
||||||
- **MetalLB** Helm **0.15.3** / app **v0.15.3**; **IPAddressPool** `noble-l2` + **L2Advertisement** — pool **`192.168.50.210`–`192.168.50.229`**.
|
- **MetalLB** Helm **0.15.3** / app **v0.15.3**; **IPAddressPool** `noble-l2` + **L2Advertisement** — pool **`192.168.50.210`–`192.168.50.229`**.
|
||||||
- **kube-vip** DaemonSet **3/3** on control planes; VIP **`192.168.50.230`** on **`ens18`** (`vip_subnet` **`/32`** required — bare **`32`** breaks parsing). **Verified from workstation:** `kubectl config set-cluster noble --server=https://192.168.50.230:6443` then **`kubectl get --raw /healthz`** → **`ok`** (`talos/kubeconfig`; see `talos/README.md`).
|
- **kube-vip** DaemonSet **3/3** on control planes; VIP **`192.168.50.230`** on **`ens18`** (`vip_subnet` **`/32`** required — bare **`32`** breaks parsing). **Verified from workstation:** `kubectl config set-cluster noble --server=https://192.168.50.230:6443` then **`kubectl get --raw /healthz`** → **`ok`** (`talos/kubeconfig`; see `talos/README.md`).
|
||||||
- **metrics-server** Helm **3.13.0** / app **v0.8.0** — `clusters/noble/bootstrap/metrics-server/values.yaml` (`--kubelet-insecure-tls` for Talos); **`kubectl top nodes`** works.
|
- **metrics-server** Helm **3.13.0** / app **v0.8.0** — `clusters/noble/bootstrap/metrics-server/values.yaml` (`--kubelet-insecure-tls` for Talos); **`kubectl top nodes`** works.
|
||||||
@@ -83,6 +84,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
|||||||
| Longhorn GPT wipe automation | `talos/scripts/longhorn-gpt-recovery.sh` |
|
| Longhorn GPT wipe automation | `talos/scripts/longhorn-gpt-recovery.sh` |
|
||||||
| kube-vip (kustomize) | `clusters/noble/bootstrap/kube-vip/` (`vip_interface` e.g. `ens18`) |
|
| kube-vip (kustomize) | `clusters/noble/bootstrap/kube-vip/` (`vip_interface` e.g. `ens18`) |
|
||||||
| Cilium (Helm values) | `clusters/noble/bootstrap/cilium/` — `values.yaml` (phase 1), optional `values-kpr.yaml`, `README.md` |
|
| Cilium (Helm values) | `clusters/noble/bootstrap/cilium/` — `values.yaml` (phase 1), optional `values-kpr.yaml`, `README.md` |
|
||||||
|
| CSI Volume Snapshot (CRDs + controller) | `clusters/noble/bootstrap/csi-snapshot-controller/` — `crd/`, `controller/` kustomize; **`ansible/roles/noble_csi_snapshot_controller`** |
|
||||||
| MetalLB | `clusters/noble/bootstrap/metallb/` — `namespace.yaml` (PSA **privileged**), `ip-address-pool.yaml`, `kustomization.yaml`, `README.md` |
|
| MetalLB | `clusters/noble/bootstrap/metallb/` — `namespace.yaml` (PSA **privileged**), `ip-address-pool.yaml`, `kustomization.yaml`, `README.md` |
|
||||||
| Longhorn | `clusters/noble/bootstrap/longhorn/` — `values.yaml`, `namespace.yaml` (PSA **privileged**), `kustomization.yaml` |
|
| Longhorn | `clusters/noble/bootstrap/longhorn/` — `values.yaml`, `namespace.yaml` (PSA **privileged**), `kustomization.yaml` |
|
||||||
| metrics-server (Helm values) | `clusters/noble/bootstrap/metrics-server/values.yaml` |
|
| metrics-server (Helm values) | `clusters/noble/bootstrap/metrics-server/values.yaml` |
|
||||||
@@ -109,12 +111,13 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
|||||||
1. **Talos** installed; **Cilium** (or chosen CNI) **before** most workloads — with `cni: none`, nodes stay **NotReady** / **network-unavailable** taint until CNI is up.
|
1. **Talos** installed; **Cilium** (or chosen CNI) **before** most workloads — with `cni: none`, nodes stay **NotReady** / **network-unavailable** taint until CNI is up.
|
||||||
2. **MetalLB Helm chart** (CRDs + controller) **before** `kubectl apply -k` on the pool manifests.
|
2. **MetalLB Helm chart** (CRDs + controller) **before** `kubectl apply -k` on the pool manifests.
|
||||||
3. **`clusters/noble/bootstrap/metallb/namespace.yaml`** before or merged onto `metallb-system` so Pod Security does not block speaker (see `bootstrap/metallb/README.md`).
|
3. **`clusters/noble/bootstrap/metallb/namespace.yaml`** before or merged onto `metallb-system` so Pod Security does not block speaker (see `bootstrap/metallb/README.md`).
|
||||||
4. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
|
4. **CSI Volume snapshots:** **`kubernetes-csi/external-snapshotter`** CRDs + **`snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`) before relying on **Longhorn** / **Velero** volume snapshots.
|
||||||
5. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
|
5. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
|
||||||
6. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
|
6. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
|
||||||
7. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
|
7. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
|
||||||
8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
|
8. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
|
||||||
9. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
|
9. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
|
||||||
|
10. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
|
||||||
|
|
||||||
## Prerequisites (before phases)
|
## Prerequisites (before phases)
|
||||||
|
|
||||||
@@ -140,9 +143,10 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
|||||||
|
|
||||||
## Phase B — Core platform
|
## Phase B — Core platform
|
||||||
|
|
||||||
**Install order:** **Cilium** → **metrics-server** → **Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned.
|
**Install order:** **Cilium** → **Volume Snapshot CRDs + snapshot-controller** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**) → **metrics-server** → **Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned.
|
||||||
|
|
||||||
- [x] **Cilium** (Helm **1.16.6**) — **required** before MetalLB if `cni: none` (`clusters/noble/bootstrap/cilium/`)
|
- [x] **Cilium** (Helm **1.16.6**) — **required** before MetalLB if `cni: none` (`clusters/noble/bootstrap/cilium/`)
|
||||||
|
- [x] **CSI Volume Snapshot** — CRDs + **`snapshot-controller`** in **`kube-system`** (`clusters/noble/bootstrap/csi-snapshot-controller/`); Ansible **`noble_csi_snapshot_controller`**; verify `kubectl api-resources | grep VolumeSnapshot`
|
||||||
- [x] **metrics-server** — Helm **3.13.0**; values in `clusters/noble/bootstrap/metrics-server/values.yaml`; verify `kubectl top nodes`
|
- [x] **metrics-server** — Helm **3.13.0**; values in `clusters/noble/bootstrap/metrics-server/values.yaml`; verify `kubectl top nodes`
|
||||||
- [x] **Longhorn** — Talos: user volume + kubelet mounts + extensions (`talos/README.md` §5); Helm **1.11.1**; `kubectl apply -k clusters/noble/bootstrap/longhorn`; verify **`nodes.longhorn.io`** and test PVC **`Bound`**
|
- [x] **Longhorn** — Talos: user volume + kubelet mounts + extensions (`talos/README.md` §5); Helm **1.11.1**; `kubectl apply -k clusters/noble/bootstrap/longhorn`; verify **`nodes.longhorn.io`** and test PVC **`Bound`**
|
||||||
- [x] **MetalLB** — chart installed; **pool + L2** from `clusters/noble/bootstrap/metallb/` applied (`192.168.50.210`–`229`)
|
- [x] **MetalLB** — chart installed; **pool + L2** from `clusters/noble/bootstrap/metallb/` applied (`192.168.50.210`–`229`)
|
||||||
|
|||||||
Reference in New Issue
Block a user