Remove Argo CD application configurations for Fluent Bit, Headlamp, Loki, kube-prometheus, and associated kustomization files from the noble bootstrap directory. This cleanup streamlines the project by eliminating unused resources and simplifies the deployment structure.

Remove deprecated Argo CD application configurations for various components including cert-manager, Cilium, CSI snapshot controllers, kube-vip, and others. Update README.md to reflect the current state of leaf applications and clarify optional components. Adjust kustomization files to streamline resource management for bootstrap workloads.
Update Argo CD documentation and kustomization files to include additional applications and namespace resources. Enhance README.md with current leaf applications and clarify optional components. This improves deployment clarity and organization for bootstrap workloads.
2026-04-01 02:14:49 -04:00 · 2026-04-01 02:13:15 -04:00 · 2026-04-01 02:11:19 -04:00 · 2026-04-01 02:05:10 -04:00 · 2026-04-01 01:55:41 -04:00 · 2026-04-01 01:21:32 -04:00
100 changed files with 2302 additions and 833 deletions
--- a/.env.sample
+++ b/.env.sample
@@ -11,3 +11,9 @@ CLOUDFLARE_DNS_API_TOKEN=
 PANGOLIN_ENDPOINT=
 NEWT_ID=
 NEWT_SECRET=
+
+# Velero — when **noble_velero_install=true**, set bucket + S3 API URL and credentials (see clusters/noble/bootstrap/velero/README.md).
+NOBLE_VELERO_S3_BUCKET=
+NOBLE_VELERO_S3_URL=
+NOBLE_VELERO_AWS_ACCESS_KEY_ID=
+NOBLE_VELERO_AWS_SECRET_ACCESS_KEY=
--- a/.sops.yaml
+++ b/.sops.yaml
@@ -0,0 +1,7 @@
+# Mozilla SOPS — encrypt/decrypt Kubernetes Secret manifests under clusters/noble/secrets/
+# Generate a key: age-keygen -o age-key.txt  (age-key.txt is gitignored)
+# Add the printed public key below (one recipient per line is supported).
+creation_rules:
+  - path_regex: clusters/noble/secrets/.*\.yaml$
+    age: >-
+      age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn
--- a/README.md
+++ b/README.md
@@ -180,6 +180,12 @@ Shared services used across multiple applications.

 **Configuration:** Requires Pangolin endpoint URL, Newt ID, and Newt secret.

+### versitygw/ (`komodo/s3/versitygw/`)
+
+- **[Versity S3 Gateway](https://github.com/versity/versitygw)** — S3 API on port **10000** by default; optional **WebUI** on **8080** (not the same listener—enable `VERSITYGW_WEBUI_PORT` / `VGW_WEBUI_GATEWAYS` per `.env.sample`). Behind **Pangolin**, expose the API and WebUI separately (or you will see **404** browsing the API URL).
+
+**Configuration:** Set either `ROOT_ACCESS_KEY` / `ROOT_SECRET_KEY` or `ROOT_ACCESS_KEY_ID` / `ROOT_SECRET_ACCESS_KEY`. Optional `VERSITYGW_PORT`. Compose uses `${VAR}` interpolation so credentials work with Komodo’s `docker compose --env-file <run_directory>/.env` (avoid `env_file:` in the service when `run_directory` is not the same folder as `compose.yaml`, or the written `.env` will not be found).
+
 ---

 ## 📊 Monitoring (`komodo/monitor/`)
--- a/ansible/README.md
+++ b/ansible/README.md
@@ -24,6 +24,7 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
 ## Prerequisites

 - `talosctl` (matches node Talos version), `talhelper`, `helm`, `kubectl`.
+- **SOPS secrets:** `sops` and `age` on the control host if you use **`clusters/noble/secrets/`** with **`age-key.txt`** (see **`clusters/noble/secrets/README.md`**).
 - **Phase A:** same LAN/VPN as nodes so **Talos :50000** and **Kubernetes :6443** are reachable (see [`talos/README.md`](../talos/README.md) §3).
 - **noble.yml:** bootstrapped cluster and **`talos/kubeconfig`** (or `KUBECONFIG`).

@@ -34,8 +35,16 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
 | [`playbooks/deploy.yml`](playbooks/deploy.yml) | **Talos Phase A** then **`noble.yml`** (full automation). |
 | [`playbooks/talos_phase_a.yml`](playbooks/talos_phase_a.yml) | `genconfig` → `apply-config` → `bootstrap` → `kubeconfig` only. |
 | [`playbooks/noble.yml`](playbooks/noble.yml) | Helm + `kubectl` platform (after Phase A). |
-| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | Vault / ESO reminders (`noble_apply_vault_cluster_secret_store`). |
+| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | SOPS reminders and optional Argo root Application note. |
 | [`playbooks/talos_bootstrap.yml`](playbooks/talos_bootstrap.yml) | **`talhelper genconfig` only** (legacy shortcut; prefer **`talos_phase_a.yml`**). |
+| [`playbooks/debian_harden.yml`](playbooks/debian_harden.yml) | Baseline hardening for Debian servers (SSH/sysctl/fail2ban/unattended-upgrades). |
+| [`playbooks/debian_maintenance.yml`](playbooks/debian_maintenance.yml) | Debian maintenance run (apt upgrades, autoremove/autoclean, reboot when required). |
+| [`playbooks/debian_rotate_ssh_keys.yml`](playbooks/debian_rotate_ssh_keys.yml) | Rotate managed users' `authorized_keys`. |
+| [`playbooks/debian_ops.yml`](playbooks/debian_ops.yml) | Convenience pipeline: harden then maintenance for Debian servers. |
+| [`playbooks/proxmox_prepare.yml`](playbooks/proxmox_prepare.yml) | Configure Proxmox community repos and disable no-subscription UI warning. |
+| [`playbooks/proxmox_upgrade.yml`](playbooks/proxmox_upgrade.yml) | Proxmox maintenance run (apt dist-upgrade, cleanup, reboot when required). |
+| [`playbooks/proxmox_cluster.yml`](playbooks/proxmox_cluster.yml) | Create a Proxmox cluster on the master and join additional hosts. |
+| [`playbooks/proxmox_ops.yml`](playbooks/proxmox_ops.yml) | Convenience pipeline: prepare, upgrade, then cluster Proxmox hosts. |

 ```bash
 cd ansible
@@ -65,11 +74,13 @@ Override with `-e` when needed, e.g. **`-e noble_talos_skip_bootstrap=true`** if
 ```bash
 ansible-playbook playbooks/noble.yml --tags cilium,metallb
 ansible-playbook playbooks/noble.yml --skip-tags newt
+ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true -e noble_velero_s3_bucket=... -e noble_velero_s3_url=...
 ```

-### Variables — `group_vars/all.yml`
+### Variables — `group_vars/all.yml` and role defaults

- **`noble_newt_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_apply_vault_cluster_secret_store`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**.
+- **`group_vars/all.yml`:** **`noble_newt_install`**, **`noble_velero_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_argocd_apply_root_application`**, **`noble_argocd_apply_bootstrap_root_application`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**
+- **`roles/noble_platform/defaults/main.yml`:** **`noble_apply_sops_secrets`**, **`noble_sops_age_key_file`** (SOPS secrets under **`clusters/noble/secrets/`**)

 ## Roles

@@ -77,10 +88,67 @@ ansible-playbook playbooks/noble.yml --skip-tags newt
 |------|----------|
 | `talos_phase_a` | Talos genconfig, apply-config, bootstrap, kubeconfig |
 | `helm_repos` | `helm repo add` / `update` |
-| `noble_*` | Cilium, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack |
+| `noble_*` | Cilium, CSI Volume Snapshot CRDs + controller, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional) |
 | `noble_landing_urls` | Writes **`ansible/output/noble-lab-ui-urls.md`** — URLs, service names, and (optional) Argo/Grafana passwords from Secrets |
 | `noble_post_deploy` | Post-install reminders |
 | `talos_bootstrap` | Genconfig-only (used by older playbook) |
+| `debian_baseline_hardening` | Baseline Debian hardening (SSH policy, sysctl profile, fail2ban, unattended upgrades) |
+| `debian_maintenance` | Routine Debian maintenance tasks (updates, cleanup, reboot-on-required) |
+| `debian_ssh_key_rotation` | Declarative `authorized_keys` rotation for server users |
+| `proxmox_baseline` | Proxmox repo prep (community repos) and no-subscription warning suppression |
+| `proxmox_maintenance` | Proxmox package maintenance (dist-upgrade, cleanup, reboot-on-required) |
+| `proxmox_cluster` | Proxmox cluster bootstrap/join automation using `pvecm` |
+
+## Debian server ops quick start
+
+These playbooks are separate from the Talos/noble flow and target hosts in `debian_servers`.
+
+1. Copy `inventory/debian.example.yml` to `inventory/debian.yml` and update hosts/users.
+2. Update `group_vars/debian_servers.yml` with your allowed SSH users and real public keys.
+3. Run with the Debian inventory:
+
+```bash
+cd ansible
+ansible-playbook -i inventory/debian.yml playbooks/debian_harden.yml
+ansible-playbook -i inventory/debian.yml playbooks/debian_rotate_ssh_keys.yml
+ansible-playbook -i inventory/debian.yml playbooks/debian_maintenance.yml
+```
+
+Or run the combined maintenance pipeline:
+
+```bash
+cd ansible
+ansible-playbook -i inventory/debian.yml playbooks/debian_ops.yml
+```
+
+## Proxmox host + cluster quick start
+
+These playbooks are separate from the Talos/noble flow and target hosts in `proxmox_hosts`.
+
+1. Copy `inventory/proxmox.example.yml` to `inventory/proxmox.yml` and update hosts/users.
+2. Update `group_vars/proxmox_hosts.yml` with your cluster name (`proxmox_cluster_name`), chosen cluster master, and root public key file paths to install.
+3. First run (no SSH keys yet): use `--ask-pass` **or** set `ansible_password` (prefer Ansible Vault). Keep `ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"` in inventory for first-contact hosts.
+4. Run prepare first to install your public keys on each host, then continue:
+
+```bash
+cd ansible
+ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass
+ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
+ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml
+```
+
+After `proxmox_prepare.yml` finishes, SSH key auth should work for root (keys from `proxmox_root_authorized_key_files`), so `--ask-pass` is usually no longer needed.
+
+If `pvecm add` still prompts for the master root password during join, set `proxmox_cluster_master_root_password` (prefer Vault) to run join non-interactively.
+
+Changing `proxmox_cluster_name` only affects new cluster creation; it does not rename an already-created cluster.
+
+Or run the full Proxmox pipeline:
+
+```bash
+cd ansible
+ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
+```

 ## Migrating from Argo-managed `noble-platform`

--- a/ansible/group_vars/all.yml
+++ b/ansible/group_vars/all.yml
@@ -13,11 +13,16 @@ noble_k8s_api_server_fallback: "https://192.168.50.20:6443"
 # Only if you must skip the kubectl /healthz preflight (not recommended).
 noble_skip_k8s_health_check: false

-# Pangolin / Newt — set true only after creating newt-pangolin-auth Secret (see clusters/noble/bootstrap/newt/README.md)
+# Pangolin / Newt — set true only after newt-pangolin-auth Secret exists (SOPS: clusters/noble/secrets/ or imperative — see clusters/noble/bootstrap/newt/README.md)
 noble_newt_install: false

 # cert-manager needs Secret cloudflare-dns-api-token in cert-manager namespace before ClusterIssuers work
 noble_cert_manager_require_cloudflare_secret: true

-# post_deploy.yml — apply Vault ClusterSecretStore only after Vault is initialized and K8s auth is configured
-noble_apply_vault_cluster_secret_store: false
+# Velero — set **noble_velero_install: true** plus S3 bucket/URL (and credentials — see clusters/noble/bootstrap/velero/README.md)
+noble_velero_install: false
+
+# Argo CD — apply app-of-apps root Application (clusters/noble/bootstrap/argocd/root-application.yaml). Set false to skip.
+noble_argocd_apply_root_application: true
+# Bootstrap kustomize in Argo (**noble-bootstrap-root** → **clusters/noble/bootstrap**). Applied with manual sync; enable automation after **noble.yml** (see **clusters/noble/bootstrap/argocd/README.md** §5).
+noble_argocd_apply_bootstrap_root_application: true
--- a/ansible/group_vars/debian_servers.yml
+++ b/ansible/group_vars/debian_servers.yml
@@ -0,0 +1,12 @@
+---
+# Hardened SSH settings
+debian_baseline_ssh_allow_users:
+  - admin
+
+# Example key rotation entries. Replace with your real users and keys.
+debian_ssh_rotation_users:
+  - name: admin
+    home: /home/admin
+    state: present
+    keys:
+      - "ssh-ed25519 AAAAEXAMPLE_REPLACE_ME admin@workstation"
--- a/ansible/group_vars/proxmox_hosts.yml
+++ b/ansible/group_vars/proxmox_hosts.yml
@@ -0,0 +1,37 @@
+---
+# Proxmox repositories
+proxmox_repo_debian_codename: trixie
+proxmox_repo_disable_enterprise: true
+proxmox_repo_disable_ceph_enterprise: true
+proxmox_repo_enable_pve_no_subscription: true
+proxmox_repo_enable_ceph_no_subscription: true
+
+# Suppress "No valid subscription" warning in UI
+proxmox_no_subscription_notice_disable: true
+
+# Public keys to install for root on each Proxmox host.
+proxmox_root_authorized_key_files:
+  - "{{ lookup('env', 'HOME') }}/.ssh/id_ed25519.pub"
+  - "{{ lookup('env', 'HOME') }}/.ssh/ansible.pub"
+
+# Package upgrade/reboot policy
+proxmox_upgrade_apt_cache_valid_time: 3600
+proxmox_upgrade_autoremove: true
+proxmox_upgrade_autoclean: true
+proxmox_upgrade_reboot_if_required: true
+proxmox_upgrade_reboot_timeout: 1800
+
+# Cluster settings
+proxmox_cluster_enabled: true
+proxmox_cluster_name: atomic-hub
+
+# Bootstrap host name from inventory (first host by default if empty)
+proxmox_cluster_master: ""
+
+# Optional explicit IP/FQDN for joining; leave empty to use ansible_host of master
+proxmox_cluster_master_ip: ""
+proxmox_cluster_force: false
+
+# Optional: use only for first cluster joins when inter-node SSH trust is not established.
+# Prefer storing with Ansible Vault if you set this.
+proxmox_cluster_master_root_password: "Hemroid8"
--- a/ansible/inventory/debian.example.yml
+++ b/ansible/inventory/debian.example.yml
@@ -0,0 +1,11 @@
+---
+all:
+  children:
+    debian_servers:
+      hosts:
+        debian-01:
+          ansible_host: 192.168.50.101
+          ansible_user: admin
+        debian-02:
+          ansible_host: 192.168.50.102
+          ansible_user: admin
--- a/ansible/inventory/proxmox.example.yml
+++ b/ansible/inventory/proxmox.example.yml
@@ -0,0 +1,24 @@
+---
+all:
+  children:
+    proxmox_hosts:
+      vars:
+        ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"
+      hosts:
+        helium:
+          ansible_host: 192.168.1.100
+          ansible_user: root
+          # First run without SSH keys:
+          # ansible_password: "{{ vault_proxmox_root_password }}"
+        neon:
+          ansible_host: 192.168.1.90
+          ansible_user: root
+          # ansible_password: "{{ vault_proxmox_root_password }}"
+        argon:
+          ansible_host: 192.168.1.80
+          ansible_user: root
+          # ansible_password: "{{ vault_proxmox_root_password }}"
+        krypton:
+          ansible_host: 192.168.1.70
+          ansible_user: root
+          # ansible_password: "{{ vault_proxmox_root_password }}"
--- a/ansible/inventory/proxmox.yml
+++ b/ansible/inventory/proxmox.yml
@@ -0,0 +1,24 @@
+---
+all:
+  children:
+    proxmox_hosts:
+      vars:
+        ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"
+      hosts:
+        helium:
+          ansible_host: 192.168.1.100
+          ansible_user: root
+          # First run without SSH keys:
+          # ansible_password: "{{ vault_proxmox_root_password }}"
+        neon:
+          ansible_host: 192.168.1.90
+          ansible_user: root
+          # ansible_password: "{{ vault_proxmox_root_password }}"
+        argon:
+          ansible_host: 192.168.1.80
+          ansible_user: root
+          # ansible_password: "{{ vault_proxmox_root_password }}"
+        krypton:
+          ansible_host: 192.168.1.70
+          ansible_user: root
+          # ansible_password: "{{ vault_proxmox_root_password }}"
--- a/ansible/playbooks/debian_harden.yml
+++ b/ansible/playbooks/debian_harden.yml
@@ -0,0 +1,8 @@
+---
+- name: Debian server baseline hardening
+  hosts: debian_servers
+  become: true
+  gather_facts: true
+  roles:
+    - role: debian_baseline_hardening
+      tags: [hardening, baseline]
--- a/ansible/playbooks/debian_maintenance.yml
+++ b/ansible/playbooks/debian_maintenance.yml
@@ -0,0 +1,8 @@
+---
+- name: Debian maintenance (updates + reboot handling)
+  hosts: debian_servers
+  become: true
+  gather_facts: true
+  roles:
+    - role: debian_maintenance
+      tags: [maintenance, updates]
--- a/ansible/playbooks/debian_ops.yml
+++ b/ansible/playbooks/debian_ops.yml
@@ -0,0 +1,3 @@
+---
+- import_playbook: debian_harden.yml
+- import_playbook: debian_maintenance.yml
--- a/ansible/playbooks/debian_rotate_ssh_keys.yml
+++ b/ansible/playbooks/debian_rotate_ssh_keys.yml
@@ -0,0 +1,8 @@
+---
+- name: Debian SSH key rotation
+  hosts: debian_servers
+  become: true
+  gather_facts: false
+  roles:
+    - role: debian_ssh_key_rotation
+      tags: [ssh, ssh_keys, rotation]
--- a/ansible/playbooks/noble.yml
+++ b/ansible/playbooks/noble.yml
@@ -3,8 +3,8 @@
 # Do not run until `kubectl get --raw /healthz` returns ok (see talos/README.md §3, CLUSTER-BUILD Phase A).
 # Run from repo **ansible/** directory:  ansible-playbook playbooks/noble.yml
 #
-# Tags: repos, cilium, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt,
-#       argocd, kyverno, kyverno_policies, platform, all (default)
+# Tags: repos, cilium, csi_snapshot, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt,
+#       argocd, kyverno, kyverno_policies, platform, velero, all (default)
 - name: Noble cluster — platform stack (Ansible-managed)
  hosts: localhost
  connection: local
@@ -113,6 +113,7 @@
      tags: [always]

    # talosctl kubeconfig often sets server to the VIP; off-LAN you can reach a control-plane IP but not 192.168.50.230.
+    # kubectl stderr is often "The connection to the server ... was refused" (no substring "connection refused").
    - name: Auto-fallback API server when VIP is unreachable (temp kubeconfig)
      tags: [always]
      when:
@@ -120,8 +121,7 @@
        - noble_k8s_api_server_override | default('') | length == 0
        - not (noble_skip_k8s_health_check | default(false) | bool)
        - (noble_k8s_health_first.rc | default(1)) != 0 or (noble_k8s_health_first.stdout | default('') | trim) != 'ok'
-        - ('network is unreachable' in (noble_k8s_health_first.stderr | default('') | lower)) or
-          ('no route to host' in (noble_k8s_health_first.stderr | default('') | lower))
+        - (((noble_k8s_health_first.stderr | default('')) ~ (noble_k8s_health_first.stdout | default(''))) | lower) is search('network is unreachable|no route to host|connection refused|was refused', multiline=False)
      block:
        - name: Ensure temp dir for kubeconfig auto-fallback
          ansible.builtin.file:
@@ -202,6 +202,8 @@
      tags: [repos, helm]
    - role: noble_cilium
      tags: [cilium, cni]
+    - role: noble_csi_snapshot_controller
+      tags: [csi_snapshot, snapshot, storage]
    - role: noble_metrics_server
      tags: [metrics, metrics_server]
    - role: noble_longhorn
@@ -224,5 +226,7 @@
      tags: [kyverno_policies, policy]
    - role: noble_platform
      tags: [platform, observability, apps]
+    - role: noble_velero
+      tags: [velero, backups]
    - role: noble_landing_urls
      tags: [landing, platform, observability, apps]
--- a/ansible/playbooks/post_deploy.yml
+++ b/ansible/playbooks/post_deploy.yml
@@ -1,12 +1,7 @@
 ---
-# Manual follow-ups after **noble.yml**: Vault init/unseal, Kubernetes auth for Vault, ESO ClusterSecretStore.
-# Run: ansible-playbook playbooks/post_deploy.yml
- name: Noble cluster — post-install reminders
-  hosts: localhost
+# Manual follow-ups after **noble.yml**: SOPS key backup, optional Argo root Application.
+- hosts: localhost
  connection: local
  gather_facts: false
-  vars:
-    noble_repo_root: "{{ playbook_dir | dirname | dirname }}"
-    noble_kubeconfig: "{{ lookup('env', 'KUBECONFIG') | default(noble_repo_root + '/talos/kubeconfig', true) }}"
  roles:
-    - role: noble_post_deploy
+    - noble_post_deploy
--- a/ansible/playbooks/proxmox_cluster.yml
+++ b/ansible/playbooks/proxmox_cluster.yml
@@ -0,0 +1,9 @@
+---
+- name: Proxmox cluster bootstrap/join
+  hosts: proxmox_hosts
+  become: true
+  gather_facts: false
+  serial: 1
+  roles:
+    - role: proxmox_cluster
+      tags: [proxmox, cluster]
--- a/ansible/playbooks/proxmox_ops.yml
+++ b/ansible/playbooks/proxmox_ops.yml
@@ -0,0 +1,4 @@
+---
+- import_playbook: proxmox_prepare.yml
+- import_playbook: proxmox_upgrade.yml
+- import_playbook: proxmox_cluster.yml
--- a/ansible/playbooks/proxmox_prepare.yml
+++ b/ansible/playbooks/proxmox_prepare.yml
@@ -0,0 +1,8 @@
+---
+- name: Proxmox host preparation (community repos + no-subscription notice)
+  hosts: proxmox_hosts
+  become: true
+  gather_facts: true
+  roles:
+    - role: proxmox_baseline
+      tags: [proxmox, prepare, repos, ui]
--- a/ansible/playbooks/proxmox_upgrade.yml
+++ b/ansible/playbooks/proxmox_upgrade.yml
@@ -0,0 +1,9 @@
+---
+- name: Proxmox host maintenance (upgrade to latest)
+  hosts: proxmox_hosts
+  become: true
+  gather_facts: true
+  serial: 1
+  roles:
+    - role: proxmox_maintenance
+      tags: [proxmox, maintenance, updates]
--- a/ansible/roles/debian_baseline_hardening/defaults/main.yml
+++ b/ansible/roles/debian_baseline_hardening/defaults/main.yml
@@ -0,0 +1,39 @@
+---
+# Update apt metadata only when stale (seconds)
+debian_baseline_apt_cache_valid_time: 3600
+
+# Core host hardening packages
+debian_baseline_packages:
+  - unattended-upgrades
+  - apt-listchanges
+  - fail2ban
+  - needrestart
+  - sudo
+  - ca-certificates
+
+# SSH hardening controls
+debian_baseline_ssh_permit_root_login: "no"
+debian_baseline_ssh_password_authentication: "no"
+debian_baseline_ssh_pubkey_authentication: "yes"
+debian_baseline_ssh_x11_forwarding: "no"
+debian_baseline_ssh_max_auth_tries: 3
+debian_baseline_ssh_client_alive_interval: 300
+debian_baseline_ssh_client_alive_count_max: 2
+debian_baseline_ssh_allow_users: []
+
+# unattended-upgrades controls
+debian_baseline_enable_unattended_upgrades: true
+debian_baseline_unattended_auto_upgrade: "1"
+debian_baseline_unattended_update_lists: "1"
+
+# Kernel and network hardening sysctls
+debian_baseline_sysctl_settings:
+  net.ipv4.conf.all.accept_redirects: "0"
+  net.ipv4.conf.default.accept_redirects: "0"
+  net.ipv4.conf.all.send_redirects: "0"
+  net.ipv4.conf.default.send_redirects: "0"
+  net.ipv4.conf.all.log_martians: "1"
+  net.ipv4.conf.default.log_martians: "1"
+  net.ipv4.tcp_syncookies: "1"
+  net.ipv6.conf.all.accept_redirects: "0"
+  net.ipv6.conf.default.accept_redirects: "0"
--- a/ansible/roles/debian_baseline_hardening/handlers/main.yml
+++ b/ansible/roles/debian_baseline_hardening/handlers/main.yml
@@ -0,0 +1,12 @@
+---
+- name: Restart ssh
+  ansible.builtin.service:
+    name: ssh
+    state: restarted
+
+- name: Reload sysctl
+  ansible.builtin.command:
+    argv:
+      - sysctl
+      - --system
+  changed_when: true
--- a/ansible/roles/debian_baseline_hardening/tasks/main.yml
+++ b/ansible/roles/debian_baseline_hardening/tasks/main.yml
@@ -0,0 +1,52 @@
+---
+- name: Refresh apt cache
+  ansible.builtin.apt:
+    update_cache: true
+    cache_valid_time: "{{ debian_baseline_apt_cache_valid_time }}"
+
+- name: Install baseline hardening packages
+  ansible.builtin.apt:
+    name: "{{ debian_baseline_packages }}"
+    state: present
+
+- name: Configure unattended-upgrades auto settings
+  ansible.builtin.copy:
+    dest: /etc/apt/apt.conf.d/20auto-upgrades
+    mode: "0644"
+    content: |
+      APT::Periodic::Update-Package-Lists "{{ debian_baseline_unattended_update_lists }}";
+      APT::Periodic::Unattended-Upgrade "{{ debian_baseline_unattended_auto_upgrade }}";
+  when: debian_baseline_enable_unattended_upgrades | bool
+
+- name: Configure SSH hardening options
+  ansible.builtin.copy:
+    dest: /etc/ssh/sshd_config.d/99-hardening.conf
+    mode: "0644"
+    content: |
+      PermitRootLogin {{ debian_baseline_ssh_permit_root_login }}
+      PasswordAuthentication {{ debian_baseline_ssh_password_authentication }}
+      PubkeyAuthentication {{ debian_baseline_ssh_pubkey_authentication }}
+      X11Forwarding {{ debian_baseline_ssh_x11_forwarding }}
+      MaxAuthTries {{ debian_baseline_ssh_max_auth_tries }}
+      ClientAliveInterval {{ debian_baseline_ssh_client_alive_interval }}
+      ClientAliveCountMax {{ debian_baseline_ssh_client_alive_count_max }}
+      {% if debian_baseline_ssh_allow_users | length > 0 %}
+      AllowUsers {{ debian_baseline_ssh_allow_users | join(' ') }}
+      {% endif %}
+  notify: Restart ssh
+
+- name: Configure baseline sysctls
+  ansible.builtin.copy:
+    dest: /etc/sysctl.d/99-hardening.conf
+    mode: "0644"
+    content: |
+      {% for key, value in debian_baseline_sysctl_settings.items() %}
+      {{ key }} = {{ value }}
+      {% endfor %}
+  notify: Reload sysctl
+
+- name: Ensure fail2ban service is enabled
+  ansible.builtin.service:
+    name: fail2ban
+    enabled: true
+    state: started
--- a/ansible/roles/debian_maintenance/defaults/main.yml
+++ b/ansible/roles/debian_maintenance/defaults/main.yml
@@ -0,0 +1,7 @@
+---
+debian_maintenance_apt_cache_valid_time: 3600
+debian_maintenance_upgrade_type: dist
+debian_maintenance_autoremove: true
+debian_maintenance_autoclean: true
+debian_maintenance_reboot_if_required: true
+debian_maintenance_reboot_timeout: 1800
--- a/ansible/roles/debian_maintenance/tasks/main.yml
+++ b/ansible/roles/debian_maintenance/tasks/main.yml
@@ -0,0 +1,30 @@
+---
+- name: Refresh apt cache
+  ansible.builtin.apt:
+    update_cache: true
+    cache_valid_time: "{{ debian_maintenance_apt_cache_valid_time }}"
+
+- name: Upgrade Debian packages
+  ansible.builtin.apt:
+    upgrade: "{{ debian_maintenance_upgrade_type }}"
+
+- name: Remove orphaned packages
+  ansible.builtin.apt:
+    autoremove: "{{ debian_maintenance_autoremove }}"
+
+- name: Clean apt package cache
+  ansible.builtin.apt:
+    autoclean: "{{ debian_maintenance_autoclean }}"
+
+- name: Check if reboot is required
+  ansible.builtin.stat:
+    path: /var/run/reboot-required
+  register: debian_maintenance_reboot_required_file
+
+- name: Reboot when required by package updates
+  ansible.builtin.reboot:
+    reboot_timeout: "{{ debian_maintenance_reboot_timeout }}"
+    msg: "Reboot initiated by Ansible maintenance playbook"
+  when:
+    - debian_maintenance_reboot_if_required | bool
+    - debian_maintenance_reboot_required_file.stat.exists | default(false)
--- a/ansible/roles/debian_ssh_key_rotation/defaults/main.yml
+++ b/ansible/roles/debian_ssh_key_rotation/defaults/main.yml
@@ -0,0 +1,10 @@
+---
+# List of users to manage keys for.
+# Example:
+# debian_ssh_rotation_users:
+#   - name: deploy
+#     home: /home/deploy
+#     state: present
+#     keys:
+#       - "ssh-ed25519 AAAA... deploy@laptop"
+debian_ssh_rotation_users: []
--- a/ansible/roles/debian_ssh_key_rotation/tasks/main.yml
+++ b/ansible/roles/debian_ssh_key_rotation/tasks/main.yml
@@ -0,0 +1,50 @@
+---
+- name: Validate SSH key rotation inputs
+  ansible.builtin.assert:
+    that:
+      - item.name is defined
+      - item.home is defined
+      - (item.state | default('present')) in ['present', 'absent']
+      - (item.state | default('present')) == 'absent' or (item.keys is defined and item.keys | length > 0)
+    fail_msg: >-
+      Each entry in debian_ssh_rotation_users must include name, home, and either:
+      state=absent, or keys with at least one SSH public key.
+  loop: "{{ debian_ssh_rotation_users }}"
+  loop_control:
+    label: "{{ item.name | default('unknown') }}"
+
+- name: Ensure ~/.ssh exists for managed users
+  ansible.builtin.file:
+    path: "{{ item.home }}/.ssh"
+    state: directory
+    owner: "{{ item.name }}"
+    group: "{{ item.name }}"
+    mode: "0700"
+  loop: "{{ debian_ssh_rotation_users }}"
+  loop_control:
+    label: "{{ item.name }}"
+  when: (item.state | default('present')) == 'present'
+
+- name: Rotate authorized_keys for managed users
+  ansible.builtin.copy:
+    dest: "{{ item.home }}/.ssh/authorized_keys"
+    owner: "{{ item.name }}"
+    group: "{{ item.name }}"
+    mode: "0600"
+    content: |
+      {% for key in item.keys %}
+      {{ key }}
+      {% endfor %}
+  loop: "{{ debian_ssh_rotation_users }}"
+  loop_control:
+    label: "{{ item.name }}"
+  when: (item.state | default('present')) == 'present'
+
+- name: Remove authorized_keys for users marked absent
+  ansible.builtin.file:
+    path: "{{ item.home }}/.ssh/authorized_keys"
+    state: absent
+  loop: "{{ debian_ssh_rotation_users }}"
+  loop_control:
+    label: "{{ item.name }}"
+  when: (item.state | default('present')) == 'absent'
--- a/ansible/roles/helm_repos/defaults/main.yml
+++ b/ansible/roles/helm_repos/defaults/main.yml
@@ -8,11 +8,9 @@ noble_helm_repos:
  - { name: fossorial, url: "https://charts.fossorial.io" }
  - { name: argo, url: "https://argoproj.github.io/argo-helm" }
  - { name: metrics-server, url: "https://kubernetes-sigs.github.io/metrics-server/" }
-  - { name: sealed-secrets, url: "https://bitnami-labs.github.io/sealed-secrets" }
-  - { name: external-secrets, url: "https://charts.external-secrets.io" }
-  - { name: hashicorp, url: "https://helm.releases.hashicorp.com" }
  - { name: prometheus-community, url: "https://prometheus-community.github.io/helm-charts" }
  - { name: grafana, url: "https://grafana.github.io/helm-charts" }
  - { name: fluent, url: "https://fluent.github.io/helm-charts" }
  - { name: headlamp, url: "https://kubernetes-sigs.github.io/headlamp/" }
  - { name: kyverno, url: "https://kyverno.github.io/kyverno/" }
+  - { name: vmware-tanzu, url: "https://vmware-tanzu.github.io/helm-charts" }
--- a/ansible/roles/noble_argocd/defaults/main.yml
+++ b/ansible/roles/noble_argocd/defaults/main.yml
@@ -0,0 +1,6 @@
+---
+# When true, applies clusters/noble/bootstrap/argocd/root-application.yaml (app-of-apps).
+# Edit spec.source.repoURL in that file if your Git remote differs.
+noble_argocd_apply_root_application: false
+# When true, applies clusters/noble/bootstrap/argocd/bootstrap-root-application.yaml (noble-bootstrap-root; manual sync until README §5).
+noble_argocd_apply_bootstrap_root_application: true
--- a/ansible/roles/noble_argocd/tasks/main.yml
+++ b/ansible/roles/noble_argocd/tasks/main.yml
@@ -15,6 +15,32 @@
      - -f
      - "{{ noble_repo_root }}/clusters/noble/bootstrap/argocd/values.yaml"
      - --wait
+      - --timeout
+      - 15m
  environment:
    KUBECONFIG: "{{ noble_kubeconfig }}"
  changed_when: true
+
+- name: Apply Argo CD root Application (app-of-apps)
+  ansible.builtin.command:
+    argv:
+      - kubectl
+      - apply
+      - -f
+      - "{{ noble_repo_root }}/clusters/noble/bootstrap/argocd/root-application.yaml"
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+  when: noble_argocd_apply_root_application | default(false) | bool
+  changed_when: true
+
+- name: Apply Argo CD bootstrap app-of-apps Application
+  ansible.builtin.command:
+    argv:
+      - kubectl
+      - apply
+      - -f
+      - "{{ noble_repo_root }}/clusters/noble/bootstrap/argocd/bootstrap-root-application.yaml"
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+  when: noble_argocd_apply_bootstrap_root_application | default(false) | bool
+  changed_when: true
--- a/ansible/roles/noble_csi_snapshot_controller/defaults/main.yml
+++ b/ansible/roles/noble_csi_snapshot_controller/defaults/main.yml
@@ -0,0 +1,2 @@
+---
+noble_csi_snapshot_kubectl_timeout: 120s
--- a/ansible/roles/noble_csi_snapshot_controller/tasks/main.yml
+++ b/ansible/roles/noble_csi_snapshot_controller/tasks/main.yml
@@ -0,0 +1,39 @@
+---
+# Volume Snapshot CRDs + snapshot-controller (Velero CSI / Longhorn snapshots).
+- name: Apply Volume Snapshot CRDs (snapshot.storage.k8s.io)
+  ansible.builtin.command:
+    argv:
+      - kubectl
+      - apply
+      - "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}"
+      - -k
+      - "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/crd"
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+  changed_when: true
+
+- name: Apply snapshot-controller in kube-system
+  ansible.builtin.command:
+    argv:
+      - kubectl
+      - apply
+      - "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}"
+      - -k
+      - "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/controller"
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+  changed_when: true
+
+- name: Wait for snapshot-controller Deployment
+  ansible.builtin.command:
+    argv:
+      - kubectl
+      - -n
+      - kube-system
+      - rollout
+      - status
+      - deploy/snapshot-controller
+      - --timeout=120s
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+  changed_when: false
--- a/ansible/roles/noble_landing_urls/defaults/main.yml
+++ b/ansible/roles/noble_landing_urls/defaults/main.yml
@@ -39,8 +39,13 @@ noble_lab_ui_entries:
    namespace: longhorn-system
    service: longhorn-frontend
    url: https://longhorn.apps.noble.lab.pcenicni.dev
-  - name: Vault
-    description: Secrets engine UI (after init/unseal)
-    namespace: vault
-    service: vault
-    url: https://vault.apps.noble.lab.pcenicni.dev
+  - name: Velero
+    description: Cluster backups — no web UI (velero CLI / kubectl CRDs)
+    namespace: velero
+    service: velero
+    url: ""
+  - name: Homepage
+    description: App dashboard (links to lab UIs)
+    namespace: homepage
+    service: homepage
+    url: https://homepage.apps.noble.lab.pcenicni.dev
--- a/ansible/roles/noble_landing_urls/templates/noble-lab-ui-urls.md.j2
+++ b/ansible/roles/noble_landing_urls/templates/noble-lab-ui-urls.md.j2
@@ -11,7 +11,7 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t
 | UI | What | Kubernetes service | Namespace | URL |
 |----|------|----------------------|-----------|-----|
 {% for e in noble_lab_ui_entries %}
-| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | [{{ e.url }}]({{ e.url }}) |
+| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | {% if e.url | default('') | length > 0 %}[{{ e.url }}]({{ e.url }}){% else %}—{% endif %} |
 {% endfor %}

 ## Initial access (logins)
@@ -24,7 +24,6 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t
 | **Prometheus** | — | No auth in default install (lab). |
 | **Alertmanager** | — | No auth in default install (lab). |
 | **Longhorn** | — | No default login unless you enable access control in the UI settings. |
-| **Vault** | Token | Root token is only from **`vault operator init`** (not stored in git). See `clusters/noble/bootstrap/vault/README.md`. |

 ### Commands to retrieve passwords (if not filled above)

@@ -46,6 +45,7 @@ To generate this file **without** calling kubectl, run Ansible with **`-e noble_

 - **Argo CD** `argocd-initial-admin-secret` disappears after you change the admin password.
 - **Grafana** password is random unless you set `grafana.adminPassword` in chart values.
- **Vault** UI needs **unsealed** Vault; tokens come from your chosen auth method.
 - **Prometheus / Alertmanager** UIs are unauthenticated by default — restrict when hardening (`talos/CLUSTER-BUILD.md` Phase G).
+- **SOPS:** cluster secrets in git under **`clusters/noble/secrets/`** are encrypted; decrypt with **`age-key.txt`** (not in git). See **`clusters/noble/secrets/README.md`**.
 - **Headlamp** token above expires after the configured duration; re-run Ansible or `kubectl create token` to refresh.
+- **Velero** has **no web UI** — use **`velero`** CLI or **`kubectl -n velero get backup,schedule,backupstoragelocation`**. Metrics: **`velero`** Service in **`velero`** (Prometheus scrape). See `clusters/noble/bootstrap/velero/README.md`.
--- a/ansible/roles/noble_platform/defaults/main.yml
+++ b/ansible/roles/noble_platform/defaults/main.yml
@@ -4,5 +4,6 @@ noble_platform_kubectl_request_timeout: 120s
 noble_platform_kustomize_retries: 5
 noble_platform_kustomize_delay: 20

-# Vault: injector (vault-k8s) owns MutatingWebhookConfiguration.caBundle; Helm upgrade can SSA-conflict. Delete webhook so Helm can recreate it.
-noble_vault_delete_injector_webhook_before_helm: true
+# Decrypt **clusters/noble/secrets/*.yaml** with SOPS and kubectl apply (requires **sops**, **age**, and **age-key.txt**).
+noble_apply_sops_secrets: true
+noble_sops_age_key_file: "{{ noble_repo_root }}/age-key.txt"
--- a/ansible/roles/noble_platform/tasks/main.yml
+++ b/ansible/roles/noble_platform/tasks/main.yml
@@ -1,6 +1,6 @@
 ---
 # Mirrors former **noble-platform** Argo Application: Helm releases + plain manifests under clusters/noble/bootstrap.
- name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource, Vault extras)
+- name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource)
  ansible.builtin.command:
    argv:
      - kubectl
@@ -16,77 +16,26 @@
  until: noble_platform_kustomize.rc == 0
  changed_when: true

- name: Install Sealed Secrets
-  ansible.builtin.command:
-    argv:
-      - helm
-      - upgrade
-      - --install
-      - sealed-secrets
-      - sealed-secrets/sealed-secrets
-      - --namespace
-      - sealed-secrets
-      - --version
-      - "2.18.4"
-      - -f
-      - "{{ noble_repo_root }}/clusters/noble/bootstrap/sealed-secrets/values.yaml"
-      - --wait
-  environment:
-    KUBECONFIG: "{{ noble_kubeconfig }}"
-  changed_when: true
+- name: Stat SOPS age private key (age-key.txt)
+  ansible.builtin.stat:
+    path: "{{ noble_sops_age_key_file }}"
+  register: noble_sops_age_key_stat

- name: Install External Secrets Operator
-  ansible.builtin.command:
-    argv:
-      - helm
-      - upgrade
-      - --install
-      - external-secrets
-      - external-secrets/external-secrets
-      - --namespace
-      - external-secrets
-      - --version
-      - "2.2.0"
-      - -f
-      - "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/values.yaml"
-      - --wait
+- name: Apply SOPS-encrypted cluster secrets (clusters/noble/secrets/*.yaml)
+  ansible.builtin.shell: |
+    set -euo pipefail
+    shopt -s nullglob
+    for f in "{{ noble_repo_root }}/clusters/noble/secrets"/*.yaml; do
+      sops -d "$f" | kubectl apply -f -
+    done
+  args:
+    executable: /bin/bash
  environment:
    KUBECONFIG: "{{ noble_kubeconfig }}"
-  changed_when: true
-
-# vault-k8s patches webhook CA after install; Helm 3/4 SSA then conflicts on upgrade. Removing the MWC lets Helm re-apply cleanly; injector repopulates caBundle.
- name: Delete Vault agent injector MutatingWebhookConfiguration before Helm (avoids caBundle field conflict)
-  ansible.builtin.command:
-    argv:
-      - kubectl
-      - delete
-      - mutatingwebhookconfiguration
-      - vault-agent-injector-cfg
-      - --ignore-not-found
-  environment:
-    KUBECONFIG: "{{ noble_kubeconfig }}"
-  register: noble_vault_mwc_delete
-  when: noble_vault_delete_injector_webhook_before_helm | default(true) | bool
-  changed_when: "'deleted' in (noble_vault_mwc_delete.stdout | default(''))"
-
- name: Install Vault
-  ansible.builtin.command:
-    argv:
-      - helm
-      - upgrade
-      - --install
-      - vault
-      - hashicorp/vault
-      - --namespace
-      - vault
-      - --version
-      - "0.32.0"
-      - -f
-      - "{{ noble_repo_root }}/clusters/noble/bootstrap/vault/values.yaml"
-      - --wait
-  environment:
-    KUBECONFIG: "{{ noble_kubeconfig }}"
-    HELM_SERVER_SIDE_APPLY: "false"
+    SOPS_AGE_KEY_FILE: "{{ noble_sops_age_key_file }}"
+  when:
+    - noble_apply_sops_secrets | default(true) | bool
+    - noble_sops_age_key_stat.stat.exists
  changed_when: true

 - name: Install kube-prometheus-stack
--- a/ansible/roles/noble_post_deploy/tasks/main.yml
+++ b/ansible/roles/noble_post_deploy/tasks/main.yml
@@ -1,27 +1,15 @@
 ---
- name: Vault — manual steps (not automated)
+- name: SOPS secrets (workstation)
  ansible.builtin.debug:
    msg: |
-      1. kubectl -n vault get pods  (wait for Running)
-      2. kubectl -n vault exec -it vault-0 -- vault operator init  (once; save keys)
-      3. Unseal per clusters/noble/bootstrap/vault/README.md
-      4. ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
-      5. kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
-
- name: Optional — apply Vault ClusterSecretStore for External Secrets
-  ansible.builtin.command:
-    argv:
-      - kubectl
-      - apply
-      - -f
-      - "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
-  environment:
-    KUBECONFIG: "{{ noble_kubeconfig }}"
-  when: noble_apply_vault_cluster_secret_store | default(false) | bool
-  changed_when: true
+      Encrypted Kubernetes Secrets live under clusters/noble/secrets/ (Mozilla SOPS + age).
+      Private key: age-key.txt at repo root (gitignored). See clusters/noble/secrets/README.md
+      and .sops.yaml. noble.yml decrypt-applies these when age-key.txt exists.

 - name: Argo CD optional root Application (empty app-of-apps)
  ansible.builtin.debug:
    msg: >-
-      Optional: kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
-      after editing repoURL. Core workloads are not synced by Argo — see clusters/noble/apps/README.md
+      App-of-apps: noble.yml applies root-application.yaml when noble_argocd_apply_root_application is true;
+      bootstrap-root-application.yaml when noble_argocd_apply_bootstrap_root_application is true (group_vars/all.yml).
+      noble-bootstrap-root uses manual sync until you enable automation after the playbook —
+      clusters/noble/bootstrap/argocd/README.md §5. See clusters/noble/apps/README.md and that README.
--- a/ansible/roles/noble_velero/defaults/main.yml
+++ b/ansible/roles/noble_velero/defaults/main.yml
@@ -0,0 +1,13 @@
+---
+# **noble_velero_install** is in **ansible/group_vars/all.yml**. Override S3 fields via extra-vars or group_vars.
+noble_velero_chart_version: "12.0.0"
+
+noble_velero_s3_bucket: ""
+noble_velero_s3_url: ""
+noble_velero_s3_region: "us-east-1"
+noble_velero_s3_force_path_style: "true"
+noble_velero_s3_prefix: ""
+
+# Optional — if unset, Ansible expects Secret **velero/velero-cloud-credentials** (key **cloud**) to exist.
+noble_velero_aws_access_key_id: ""
+noble_velero_aws_secret_access_key: ""
--- a/ansible/roles/noble_velero/tasks/from_env.yml
+++ b/ansible/roles/noble_velero/tasks/from_env.yml
@@ -0,0 +1,68 @@
+---
+# See repository **.env.sample** — copy to **.env** (gitignored).
+- name: Stat repository .env for Velero
+  ansible.builtin.stat:
+    path: "{{ noble_repo_root }}/.env"
+  register: noble_deploy_env_file
+  changed_when: false
+
+- name: Load NOBLE_VELERO_S3_BUCKET from .env when unset
+  ansible.builtin.shell: |
+    set -a
+    . "{{ noble_repo_root }}/.env"
+    set +a
+    echo "${NOBLE_VELERO_S3_BUCKET:-}"
+  register: noble_velero_s3_bucket_from_env
+  when:
+    - noble_deploy_env_file.stat.exists | default(false)
+    - noble_velero_s3_bucket | default('') | length == 0
+  changed_when: false
+
+- name: Apply NOBLE_VELERO_S3_BUCKET from .env
+  ansible.builtin.set_fact:
+    noble_velero_s3_bucket: "{{ noble_velero_s3_bucket_from_env.stdout | trim }}"
+  when:
+    - noble_velero_s3_bucket_from_env is defined
+    - (noble_velero_s3_bucket_from_env.stdout | default('') | trim | length) > 0
+
+- name: Load NOBLE_VELERO_S3_URL from .env when unset
+  ansible.builtin.shell: |
+    set -a
+    . "{{ noble_repo_root }}/.env"
+    set +a
+    echo "${NOBLE_VELERO_S3_URL:-}"
+  register: noble_velero_s3_url_from_env
+  when:
+    - noble_deploy_env_file.stat.exists | default(false)
+    - noble_velero_s3_url | default('') | length == 0
+  changed_when: false
+
+- name: Apply NOBLE_VELERO_S3_URL from .env
+  ansible.builtin.set_fact:
+    noble_velero_s3_url: "{{ noble_velero_s3_url_from_env.stdout | trim }}"
+  when:
+    - noble_velero_s3_url_from_env is defined
+    - (noble_velero_s3_url_from_env.stdout | default('') | trim | length) > 0
+
+- name: Create velero-cloud-credentials from .env when keys present
+  ansible.builtin.shell: |
+    set -euo pipefail
+    set -a
+    . "{{ noble_repo_root }}/.env"
+    set +a
+    if [ -z "${NOBLE_VELERO_AWS_ACCESS_KEY_ID:-}" ] || [ -z "${NOBLE_VELERO_AWS_SECRET_ACCESS_KEY:-}" ]; then
+      echo SKIP
+      exit 0
+    fi
+    CLOUD="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' \
+      "${NOBLE_VELERO_AWS_ACCESS_KEY_ID}" "${NOBLE_VELERO_AWS_SECRET_ACCESS_KEY}")"
+    kubectl -n velero create secret generic velero-cloud-credentials \
+      --from-literal=cloud="${CLOUD}" \
+      --dry-run=client -o yaml | kubectl apply -f -
+    echo APPLIED
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+  when: noble_deploy_env_file.stat.exists | default(false)
+  no_log: true
+  register: noble_velero_secret_from_env
+  changed_when: "'APPLIED' in (noble_velero_secret_from_env.stdout | default(''))"
--- a/ansible/roles/noble_velero/tasks/main.yml
+++ b/ansible/roles/noble_velero/tasks/main.yml
@@ -0,0 +1,85 @@
+---
+# Velero — S3 backup target + built-in CSI snapshots (Longhorn: label VolumeSnapshotClass per README).
+- name: Apply velero namespace
+  ansible.builtin.command:
+    argv:
+      - kubectl
+      - apply
+      - -f
+      - "{{ noble_repo_root }}/clusters/noble/bootstrap/velero/namespace.yaml"
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+  when: noble_velero_install | default(false) | bool
+  changed_when: true
+
+- name: Include Velero settings from repository .env (S3 bucket, URL, credentials)
+  ansible.builtin.include_tasks: from_env.yml
+  when: noble_velero_install | default(false) | bool
+
+- name: Require S3 bucket and endpoint for Velero
+  ansible.builtin.assert:
+    that:
+      - noble_velero_s3_bucket | default('') | length > 0
+      - noble_velero_s3_url | default('') | length > 0
+    fail_msg: >-
+      Set NOBLE_VELERO_S3_BUCKET and NOBLE_VELERO_S3_URL in .env, or noble_velero_s3_bucket / noble_velero_s3_url
+      (e.g. -e ...), or group_vars when noble_velero_install is true.
+  when: noble_velero_install | default(false) | bool
+
+- name: Create velero-cloud-credentials from Ansible vars
+  ansible.builtin.shell: |
+    set -euo pipefail
+    CLOUD="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' \
+      "${AWS_ACCESS_KEY_ID}" "${AWS_SECRET_ACCESS_KEY}")"
+    kubectl -n velero create secret generic velero-cloud-credentials \
+      --from-literal=cloud="${CLOUD}" \
+      --dry-run=client -o yaml | kubectl apply -f -
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+    AWS_ACCESS_KEY_ID: "{{ noble_velero_aws_access_key_id }}"
+    AWS_SECRET_ACCESS_KEY: "{{ noble_velero_aws_secret_access_key }}"
+  when:
+    - noble_velero_install | default(false) | bool
+    - noble_velero_aws_access_key_id | default('') | length > 0
+    - noble_velero_aws_secret_access_key | default('') | length > 0
+  no_log: true
+  changed_when: true
+
+- name: Check velero-cloud-credentials Secret
+  ansible.builtin.command:
+    argv:
+      - kubectl
+      - -n
+      - velero
+      - get
+      - secret
+      - velero-cloud-credentials
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+  register: noble_velero_secret_check
+  failed_when: false
+  changed_when: false
+  when: noble_velero_install | default(false) | bool
+
+- name: Require velero-cloud-credentials before Helm
+  ansible.builtin.assert:
+    that:
+      - noble_velero_secret_check.rc == 0
+    fail_msg: >-
+      Velero needs Secret velero/velero-cloud-credentials (key cloud). Set NOBLE_VELERO_AWS_ACCESS_KEY_ID and
+      NOBLE_VELERO_AWS_SECRET_ACCESS_KEY in .env, or noble_velero_aws_* extra-vars, or create the Secret manually
+      (see clusters/noble/bootstrap/velero/README.md).
+  when: noble_velero_install | default(false) | bool
+
+- name: Optional object prefix argv for Helm
+  ansible.builtin.set_fact:
+    noble_velero_helm_prefix_argv: "{{ ['--set-string', 'configuration.backupStorageLocation[0].prefix=' ~ (noble_velero_s3_prefix | default(''))] if (noble_velero_s3_prefix | default('') | length > 0) else [] }}"
+  when: noble_velero_install | default(false) | bool
+
+- name: Install Velero
+  ansible.builtin.command:
+    argv: "{{ ['helm', 'upgrade', '--install', 'velero', 'vmware-tanzu/velero', '--namespace', 'velero', '--version', noble_velero_chart_version, '-f', noble_repo_root ~ '/clusters/noble/bootstrap/velero/values.yaml', '--set-string', 'configuration.backupStorageLocation[0].bucket=' ~ noble_velero_s3_bucket, '--set-string', 'configuration.backupStorageLocation[0].config.s3Url=' ~ noble_velero_s3_url, '--set-string', 'configuration.backupStorageLocation[0].config.region=' ~ noble_velero_s3_region, '--set-string', 'configuration.backupStorageLocation[0].config.s3ForcePathStyle=' ~ noble_velero_s3_force_path_style] + (noble_velero_helm_prefix_argv | default([])) + ['--wait'] }}"
+  environment:
+    KUBECONFIG: "{{ noble_kubeconfig }}"
+  when: noble_velero_install | default(false) | bool
+  changed_when: true
--- a/ansible/roles/proxmox_baseline/defaults/main.yml
+++ b/ansible/roles/proxmox_baseline/defaults/main.yml
@@ -0,0 +1,14 @@
+---
+proxmox_repo_debian_codename: "{{ ansible_facts['distribution_release'] | default('bookworm') }}"
+proxmox_repo_disable_enterprise: true
+proxmox_repo_disable_ceph_enterprise: true
+proxmox_repo_enable_pve_no_subscription: true
+proxmox_repo_enable_ceph_no_subscription: false
+
+proxmox_no_subscription_notice_disable: true
+proxmox_widget_toolkit_file: /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
+
+# Bootstrap root SSH keys from the control machine so subsequent runs can use key auth.
+proxmox_root_authorized_key_files:
+  - "{{ lookup('env', 'HOME') }}/.ssh/id_ed25519.pub"
+  - "{{ lookup('env', 'HOME') }}/.ssh/ansible.pub"
--- a/ansible/roles/proxmox_baseline/handlers/main.yml
+++ b/ansible/roles/proxmox_baseline/handlers/main.yml
@@ -0,0 +1,5 @@
+---
+- name: Restart pveproxy
+  ansible.builtin.service:
+    name: pveproxy
+    state: restarted
--- a/ansible/roles/proxmox_baseline/tasks/main.yml
+++ b/ansible/roles/proxmox_baseline/tasks/main.yml
@@ -0,0 +1,100 @@
+---
+- name: Check configured local public key files
+  ansible.builtin.stat:
+    path: "{{ item }}"
+  register: proxmox_root_pubkey_stats
+  loop: "{{ proxmox_root_authorized_key_files }}"
+  delegate_to: localhost
+  become: false
+
+- name: Fail when a configured local public key file is missing
+  ansible.builtin.fail:
+    msg: "Configured key file does not exist on the control host: {{ item.item }}"
+  when: not item.stat.exists
+  loop: "{{ proxmox_root_pubkey_stats.results }}"
+  delegate_to: localhost
+  become: false
+
+- name: Ensure root authorized_keys contains configured public keys
+  ansible.posix.authorized_key:
+    user: root
+    state: present
+    key: "{{ lookup('ansible.builtin.file', item) }}"
+    manage_dir: true
+  loop: "{{ proxmox_root_authorized_key_files }}"
+
+- name: Remove enterprise repository lines from /etc/apt/sources.list
+  ansible.builtin.lineinfile:
+    path: /etc/apt/sources.list
+    regexp: ".*enterprise\\.proxmox\\.com.*"
+    state: absent
+  when:
+    - proxmox_repo_disable_enterprise | bool or proxmox_repo_disable_ceph_enterprise | bool
+  failed_when: false
+
+- name: Find apt source files that contain Proxmox enterprise repositories
+  ansible.builtin.find:
+    paths: /etc/apt/sources.list.d
+    file_type: file
+    patterns:
+      - "*.list"
+      - "*.sources"
+    contains: "enterprise\\.proxmox\\.com"
+    use_regex: true
+  register: proxmox_enterprise_repo_files
+  when:
+    - proxmox_repo_disable_enterprise | bool or proxmox_repo_disable_ceph_enterprise | bool
+
+- name: Remove enterprise repository lines from apt source files
+  ansible.builtin.lineinfile:
+    path: "{{ item.path }}"
+    regexp: ".*enterprise\\.proxmox\\.com.*"
+    state: absent
+  loop: "{{ proxmox_enterprise_repo_files.files | default([]) }}"
+  when:
+    - proxmox_repo_disable_enterprise | bool or proxmox_repo_disable_ceph_enterprise | bool
+
+- name: Find apt source files that already contain pve-no-subscription
+  ansible.builtin.find:
+    paths: /etc/apt/sources.list.d
+    file_type: file
+    patterns:
+      - "*.list"
+      - "*.sources"
+    contains: "pve-no-subscription"
+    use_regex: false
+  register: proxmox_no_sub_repo_files
+  when: proxmox_repo_enable_pve_no_subscription | bool
+
+- name: Ensure Proxmox no-subscription repository is configured when absent
+  ansible.builtin.copy:
+    dest: /etc/apt/sources.list.d/pve-no-subscription.list
+    content: "deb http://download.proxmox.com/debian/pve {{ proxmox_repo_debian_codename }} pve-no-subscription\n"
+    mode: "0644"
+  when:
+    - proxmox_repo_enable_pve_no_subscription | bool
+    - (proxmox_no_sub_repo_files.matched | default(0) | int) == 0
+
+- name: Remove duplicate pve-no-subscription.list when another source already provides it
+  ansible.builtin.file:
+    path: /etc/apt/sources.list.d/pve-no-subscription.list
+    state: absent
+  when:
+    - proxmox_repo_enable_pve_no_subscription | bool
+    - (proxmox_no_sub_repo_files.files | default([]) | map(attribute='path') | list | select('ne', '/etc/apt/sources.list.d/pve-no-subscription.list') | list | length) > 0
+
+- name: Ensure Ceph no-subscription repository is configured
+  ansible.builtin.copy:
+    dest: /etc/apt/sources.list.d/ceph-no-subscription.list
+    content: "deb http://download.proxmox.com/debian/ceph-{{ proxmox_repo_debian_codename }} {{ proxmox_repo_debian_codename }} no-subscription\n"
+    mode: "0644"
+  when: proxmox_repo_enable_ceph_no_subscription | bool
+
+- name: Disable no-subscription pop-up in Proxmox UI
+  ansible.builtin.replace:
+    path: "{{ proxmox_widget_toolkit_file }}"
+    regexp: "if \\(data\\.status !== 'Active'\\)"
+    replace: "if (false)"
+    backup: true
+  when: proxmox_no_subscription_notice_disable | bool
+  notify: Restart pveproxy
--- a/ansible/roles/proxmox_cluster/defaults/main.yml
+++ b/ansible/roles/proxmox_cluster/defaults/main.yml
@@ -0,0 +1,7 @@
+---
+proxmox_cluster_enabled: true
+proxmox_cluster_name: pve-cluster
+proxmox_cluster_master: ""
+proxmox_cluster_master_ip: ""
+proxmox_cluster_force: false
+proxmox_cluster_master_root_password: ""
--- a/ansible/roles/proxmox_cluster/tasks/main.yml
+++ b/ansible/roles/proxmox_cluster/tasks/main.yml
@@ -0,0 +1,63 @@
+---
+- name: Skip cluster role when disabled
+  ansible.builtin.meta: end_host
+  when: not (proxmox_cluster_enabled | bool)
+
+- name: Check whether corosync cluster config exists
+  ansible.builtin.stat:
+    path: /etc/pve/corosync.conf
+  register: proxmox_cluster_corosync_conf
+
+- name: Set effective Proxmox cluster master
+  ansible.builtin.set_fact:
+    proxmox_cluster_master_effective: "{{ proxmox_cluster_master | default(groups['proxmox_hosts'][0], true) }}"
+
+- name: Set effective Proxmox cluster master IP
+  ansible.builtin.set_fact:
+    proxmox_cluster_master_ip_effective: >-
+      {{
+        proxmox_cluster_master_ip
+        | default(hostvars[proxmox_cluster_master_effective].ansible_host
+        | default(proxmox_cluster_master_effective), true)
+      }}
+
+- name: Create cluster on designated master
+  ansible.builtin.command:
+    cmd: "pvecm create {{ proxmox_cluster_name }}"
+  when:
+    - inventory_hostname == proxmox_cluster_master_effective
+    - not proxmox_cluster_corosync_conf.stat.exists
+
+- name: Ensure python3-pexpect is installed for password-based cluster join
+  ansible.builtin.apt:
+    name: python3-pexpect
+    state: present
+    update_cache: true
+  when:
+    - inventory_hostname != proxmox_cluster_master_effective
+    - not proxmox_cluster_corosync_conf.stat.exists
+    - proxmox_cluster_master_root_password | length > 0
+
+- name: Join node to existing cluster (password provided)
+  ansible.builtin.expect:
+    command: >-
+      pvecm add {{ proxmox_cluster_master_ip_effective }}
+      {% if proxmox_cluster_force | bool %}--force{% endif %}
+    responses:
+      "Please enter superuser \\(root\\) password for '.*':": "{{ proxmox_cluster_master_root_password }}"
+      "password:": "{{ proxmox_cluster_master_root_password }}"
+  no_log: true
+  when:
+    - inventory_hostname != proxmox_cluster_master_effective
+    - not proxmox_cluster_corosync_conf.stat.exists
+    - proxmox_cluster_master_root_password | length > 0
+
+- name: Join node to existing cluster (SSH trust/no prompt)
+  ansible.builtin.command:
+    cmd: >-
+      pvecm add {{ proxmox_cluster_master_ip_effective }}
+      {% if proxmox_cluster_force | bool %}--force{% endif %}
+  when:
+    - inventory_hostname != proxmox_cluster_master_effective
+    - not proxmox_cluster_corosync_conf.stat.exists
+    - proxmox_cluster_master_root_password | length == 0
--- a/ansible/roles/proxmox_maintenance/defaults/main.yml
+++ b/ansible/roles/proxmox_maintenance/defaults/main.yml
@@ -0,0 +1,6 @@
+---
+proxmox_upgrade_apt_cache_valid_time: 3600
+proxmox_upgrade_autoremove: true
+proxmox_upgrade_autoclean: true
+proxmox_upgrade_reboot_if_required: true
+proxmox_upgrade_reboot_timeout: 1800
--- a/ansible/roles/proxmox_maintenance/tasks/main.yml
+++ b/ansible/roles/proxmox_maintenance/tasks/main.yml
@@ -0,0 +1,30 @@
+---
+- name: Refresh apt cache
+  ansible.builtin.apt:
+    update_cache: true
+    cache_valid_time: "{{ proxmox_upgrade_apt_cache_valid_time }}"
+
+- name: Upgrade Proxmox host packages
+  ansible.builtin.apt:
+    upgrade: dist
+
+- name: Remove orphaned packages
+  ansible.builtin.apt:
+    autoremove: "{{ proxmox_upgrade_autoremove }}"
+
+- name: Clean apt package cache
+  ansible.builtin.apt:
+    autoclean: "{{ proxmox_upgrade_autoclean }}"
+
+- name: Check if reboot is required
+  ansible.builtin.stat:
+    path: /var/run/reboot-required
+  register: proxmox_reboot_required_file
+
+- name: Reboot when required by package upgrades
+  ansible.builtin.reboot:
+    reboot_timeout: "{{ proxmox_upgrade_reboot_timeout }}"
+    msg: "Reboot initiated by Ansible Proxmox maintenance playbook"
+  when:
+    - proxmox_upgrade_reboot_if_required | bool
+    - proxmox_reboot_required_file.stat.exists | default(false)
--- a/branding/nikflix/logo.png
+++ b/branding/nikflix/logo.png
--- a/clusters/noble/apps/README.md
+++ b/clusters/noble/apps/README.md
@@ -1,7 +1,7 @@
 # Argo CD — optional applications (non-bootstrap)

-**Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, Vault, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here.
+**Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, SOPS secrets path, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here.

-**`noble-root`** (`clusters/noble/bootstrap/argocd/root-application.yaml`) points at **`clusters/noble/apps`**. Add **`Application`** manifests (and optional **`AppProject`** definitions) under this directory only for workloads that are additive and do not subsume the Ansible-managed platform.
+**`noble-root`** (`clusters/noble/bootstrap/argocd/root-application.yaml`) points at **`clusters/noble/apps`**. Add **`Application`** manifests (and optional **`AppProject`** definitions) under this directory only for workloads that are additive and do not subsume the core platform.

-For an app-of-apps pattern, use a second-level **`Application`** that syncs a subdirectory (for example **`optional/`**) containing leaf **`Application`** resources.
+Bootstrap kustomize (namespaces, static YAML, leaf **`Application`**s) lives in **`clusters/noble/bootstrap/`** and is tracked by **`noble-bootstrap-root`** — enable automated sync for that app only after **`noble.yml`** completes (**`clusters/noble/bootstrap/argocd/README.md`** §5). Put Helm **`Application`** migrations under **`clusters/noble/bootstrap/argocd/app-of-apps/`**.
--- a/clusters/noble/apps/homepage/application.yaml
+++ b/clusters/noble/apps/homepage/application.yaml
@@ -0,0 +1,32 @@
+# Argo CD — optional [Homepage](https://gethomepage.dev/) dashboard (Helm from [jameswynn.github.io/helm-charts](https://jameswynn.github.io/helm-charts/)).
+# Values: **`./values.yaml`** (multi-source **`$values`** ref).
+#
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: homepage
+  namespace: argocd
+  finalizers:
+    - resources-finalizer.argocd.argoproj.io/background
+spec:
+  project: default
+  sources:
+    - repoURL: https://jameswynn.github.io/helm-charts
+      chart: homepage
+      targetRevision: 2.1.0
+      helm:
+        releaseName: homepage
+        valueFiles:
+          - $values/clusters/noble/apps/homepage/values.yaml
+    - repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
+      targetRevision: HEAD
+      ref: values
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: homepage
+  syncPolicy:
+    automated:
+      prune: true
+      selfHeal: true
+    syncOptions:
+      - CreateNamespace=true
--- a/clusters/noble/apps/homepage/values.yaml
+++ b/clusters/noble/apps/homepage/values.yaml
@@ -0,0 +1,122 @@
+# Homepage — [gethomepage/homepage](https://github.com/gethomepage/homepage) via [jameswynn/homepage](https://github.com/jameswynn/helm-charts) Helm chart.
+# Ingress: Traefik + cert-manager (same pattern as `clusters/noble/bootstrap/headlamp/values.yaml`).
+# Service links match **`ansible/roles/noble_landing_urls/defaults/main.yml`** (`noble_lab_ui_entries`).
+# **Velero** has no in-cluster web UI — tile links to upstream docs (no **siteMonitor**).
+#
+# **`siteMonitor`** runs **server-side** in the Homepage pod (see `gethomepage/homepage` `siteMonitor.js`).
+# Public FQDNs like **`*.apps.noble.lab.pcenicni.dev`** often do **not** resolve inside the cluster
+# (split-horizon / LAN DNS only) → `ENOTFOUND` / HTTP **500** in the monitor. Use **in-cluster Service**
+# URLs for **`siteMonitor`** only; **`href`** stays the human-facing ingress URL.
+#
+# **Prometheus widget** also resolves from the pod — use the real **Service** name (Helm may truncate to
+# 63 chars — this repo’s generated UI list uses **`kube-prometheus-kube-prome-prometheus`**).
+# Verify: `kubectl -n monitoring get svc | grep -E 'prometheus|alertmanager|grafana'`.
+#
+image:
+  repository: ghcr.io/gethomepage/homepage
+  tag: v1.2.0
+
+enableRbac: true
+
+serviceAccount:
+  create: true
+
+ingress:
+  main:
+    enabled: true
+    ingressClassName: traefik
+    annotations:
+      cert-manager.io/cluster-issuer: letsencrypt-prod
+    hosts:
+      - host: homepage.apps.noble.lab.pcenicni.dev
+        paths:
+          - path: /
+            pathType: Prefix
+    tls:
+      - hosts:
+          - homepage.apps.noble.lab.pcenicni.dev
+        secretName: homepage-apps-noble-tls
+
+env:
+  - name: HOMEPAGE_ALLOWED_HOSTS
+    value: homepage.apps.noble.lab.pcenicni.dev
+
+config:
+  bookmarks: []
+  services:
+    - Noble Lab:
+        - Argo CD:
+            icon: si-argocd
+            href: https://argo.apps.noble.lab.pcenicni.dev
+            siteMonitor: http://argocd-server.argocd.svc.cluster.local:80
+            description: GitOps UI (sync, apps, repos)
+        - Grafana:
+            icon: si-grafana
+            href: https://grafana.apps.noble.lab.pcenicni.dev
+            siteMonitor: http://kube-prometheus-grafana.monitoring.svc.cluster.local:80
+            description: Dashboards, Loki explore (logs)
+        - Prometheus:
+            icon: si-prometheus
+            href: https://prometheus.apps.noble.lab.pcenicni.dev
+            siteMonitor: http://kube-prometheus-kube-prome-prometheus.monitoring.svc.cluster.local:9090
+            description: Prometheus UI (queries, targets) — lab; protect in production
+            widget:
+              type: prometheus
+              url: http://kube-prometheus-kube-prome-prometheus.monitoring.svc.cluster.local:9090
+              fields: ["targets_up", "targets_down", "targets_total"]
+        - Alertmanager:
+            icon: alertmanager.png
+            href: https://alertmanager.apps.noble.lab.pcenicni.dev
+            siteMonitor: http://kube-prometheus-kube-prome-alertmanager.monitoring.svc.cluster.local:9093
+            description: Alertmanager UI (silences, status)
+        - Headlamp:
+            icon: mdi-kubernetes
+            href: https://headlamp.apps.noble.lab.pcenicni.dev
+            siteMonitor: http://headlamp.headlamp.svc.cluster.local:80
+            description: Kubernetes UI (cluster resources)
+        - Longhorn:
+            icon: longhorn.png
+            href: https://longhorn.apps.noble.lab.pcenicni.dev
+            siteMonitor: http://longhorn-frontend.longhorn-system.svc.cluster.local:80
+            description: Storage volumes, nodes, backups
+        - Velero:
+            icon: mdi-backup-restore
+            href: https://velero.io/docs/
+            description: Cluster backups — no in-cluster web UI; use velero CLI or kubectl (docs)
+  widgets:
+    - datetime:
+        text_size: xl
+        format:
+          dateStyle: medium
+          timeStyle: short
+    - kubernetes:
+        cluster:
+          show: true
+          cpu: true
+          memory: true
+          showLabel: true
+          label: Cluster
+        nodes:
+          show: true
+          cpu: true
+          memory: true
+          showLabel: true
+    - search:
+        provider: duckduckgo
+        target: _blank
+  kubernetes:
+    mode: cluster
+  settingsString: |
+    title: Noble Lab
+    description: Homelab services — in-cluster uptime checks, cluster resources, Prometheus targets
+    theme: dark
+    color: slate
+    headerStyle: boxedWidgets
+    statusStyle: dot
+    iconStyle: theme
+    fullWidth: true
+    useEqualHeights: true
+    layout:
+      Noble Lab:
+        style: row
+        columns: 4
--- a/clusters/noble/apps/kustomization.yaml
+++ b/clusters/noble/apps/kustomization.yaml
@@ -3,4 +3,5 @@
 # Helm value files for those apps can live in subdirectories here (for example **./homepage/values.yaml**).
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
-resources: []
+resources:
+  - homepage/application.yaml
--- a/clusters/noble/bootstrap/argocd/README.md
+++ b/clusters/noble/bootstrap/argocd/README.md
@@ -50,21 +50,56 @@ helm upgrade --install argocd argo/argo-cd -n argocd --create-namespace \

 Use **Settings → Repositories** in the UI, or `argocd repo add` / a `Secret` of type `repository`.

-## 4. App-of-apps (optional GitOps only)
+## 4. App-of-apps (GitOps)

-Bootstrap **platform** workloads (CNI, ingress, cert-manager, Kyverno, observability, Vault, etc.) are installed by
-**`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not by Argo. **`clusters/noble/apps/kustomization.yaml`** is empty by default.
+**Ansible** (`ansible/playbooks/noble.yml`) performs the **initial** install: Helm releases and **`kubectl apply -k clusters/noble/bootstrap`**. **Argo** then tracks the same git paths for ongoing reconciliation.

-1. Edit **`root-application.yaml`**: set **`repoURL`** and **`targetRevision`** to this repository. The **`resources-finalizer.argocd.argoproj.io/background`** finalizer uses Argo’s path-qualified form so **`kubectl apply`** does not warn about finalizer names.
-2. When you want Argo to manage specific apps, add **`Application`** manifests under **`clusters/noble/apps/`** (see **`clusters/noble/apps/README.md`**).
-3. Apply the root:
+1. Edit **`root-application.yaml`** and **`bootstrap-root-application.yaml`**: set **`repoURL`** and **`targetRevision`**. The **`resources-finalizer.argocd.argoproj.io/background`** finalizer uses Argo’s path-qualified form so **`kubectl apply`** does not warn about finalizer names.
+2. Optional add-on apps: add **`Application`** manifests under **`clusters/noble/apps/`** (see **`clusters/noble/apps/README.md`**).
+3. **Bootstrap kustomize** (namespaces, datasource, leaf **`Application`**s under **`argocd/app-of-apps/`**, etc.): **`noble-bootstrap-root`** syncs **`clusters/noble/bootstrap`**. It is created with **manual** sync only so Argo does not apply changes while **`noble.yml`** is still running.
+
+   **`ansible/playbooks/noble.yml`** (role **`noble_argocd`**) applies both roots when **`noble_argocd_apply_root_application`** / **`noble_argocd_apply_bootstrap_root_application`** are true in **`ansible/group_vars/all.yml`**.

   ```bash
   kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
+   kubectl apply -f clusters/noble/bootstrap/argocd/bootstrap-root-application.yaml
   ```

-If you migrated from GitOps-managed **`noble-platform`** / **`noble-kyverno`**, delete stale **`Application`** objects on
-the cluster (see **`clusters/noble/apps/README.md`**) then re-apply the root.
+If you migrated from older GitOps **`Application`** names, delete stale **`Application`** objects on the cluster (see **`clusters/noble/apps/README.md`**) then re-apply the roots.
+
+## 5. After Ansible: enable automated sync for **noble-bootstrap-root**
+
+Do this only after **`ansible-playbook playbooks/noble.yml`** has finished successfully (including **`noble_platform`** `kubectl apply -k` and any Helm stages you rely on). Until then, leave **manual** sync so Argo does not fight the playbook.
+
+**Required steps**
+
+1. Confirm the cluster matches git for kustomize output (optional): `kubectl kustomize clusters/noble/bootstrap | kubectl diff -f -` or inspect resources in the UI.
+2. Register the git repo in Argo if you have not already (**§3**).
+3. **Refresh** the app so Argo compares **`clusters/noble/bootstrap`** to the cluster: Argo UI → **noble-bootstrap-root** → **Refresh**, or:
+
+   ```bash
+   argocd app get noble-bootstrap-root --refresh
+   ```
+
+4. **Enable automated sync** (prune + self-heal), preserving **`CreateNamespace`**, using any one of:
+
+   **kubectl**
+
+   ```bash
+   kubectl patch application noble-bootstrap-root -n argocd --type merge -p '{"spec":{"syncPolicy":{"automated":{"prune":true,"selfHeal":true},"syncOptions":["CreateNamespace=true"]}}}'
+   ```
+
+   **argocd** CLI (logged in)
+
+   ```bash
+   argocd app set noble-bootstrap-root --sync-policy automated --auto-prune --self-heal
+   ```
+
+   **UI:** open **noble-bootstrap-root** → **App Details** → enable **AUTO-SYNC** (and **Prune** / **Self Heal** if shown).
+
+5. Trigger a sync if the app does not go green immediately: **Sync** in the UI, or `argocd app sync noble-bootstrap-root`.
+
+After this, **git** is the source of truth for everything under **`clusters/noble/bootstrap/kustomization.yaml`** (including **`argocd/app-of-apps/`**). Helm-managed platform components remain whatever Ansible last installed until you model them as Argo **`Application`**s under **`app-of-apps/`** and stop installing them from Ansible.

 ## Versions

--- a/clusters/noble/bootstrap/argocd/root-application.yaml
+++ b/clusters/noble/bootstrap/argocd/root-application.yaml
@@ -3,8 +3,10 @@
 # 1. Set spec.source.repoURL (and targetRevision — **HEAD** tracks the remote default branch) to this repo.
 # 2. kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
 #
-# **clusters/noble/apps** holds optional **Application** manifests. Core platform is installed by
-# **ansible/playbooks/noble.yml** from **clusters/noble/bootstrap/**.
+# **clusters/noble/apps** holds optional **Application** manifests. Core platform Helm + kustomize is
+# installed by **ansible/playbooks/noble.yml** from **clusters/noble/bootstrap/**. **bootstrap-root-application.yaml**
+# registers **noble-bootstrap-root** for the same kustomize tree (**manual** sync until you enable
+# automation after the playbook — see **README.md** §5).
 #
 apiVersion: argoproj.io/v1alpha1
 kind: Application
--- a/clusters/noble/bootstrap/csi-snapshot-controller/README.md
+++ b/clusters/noble/bootstrap/csi-snapshot-controller/README.md
@@ -0,0 +1,16 @@
+# CSI Volume Snapshot (external-snapshotter)
+
+Installs the **Volume Snapshot** CRDs and the **snapshot-controller** so CSI drivers (e.g. **Longhorn**) and **Velero** can use `VolumeSnapshot` / `VolumeSnapshotContent` / `VolumeSnapshotClass`.
+
+- Upstream: [kubernetes-csi/external-snapshotter](https://github.com/kubernetes-csi/external-snapshotter) **v8.5.0**
+- **Not** the per-driver **csi-snapshotter** sidecar — Longhorn ships that with its CSI components.
+
+**Order:** apply **before** relying on volume snapshots (e.g. before or early with **Longhorn**; **Ansible** runs this after **Cilium**, before **metrics-server** / **Longhorn**).
+
+```bash
+kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/crd
+kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/controller
+kubectl -n kube-system rollout status deploy/snapshot-controller --timeout=120s
+```
+
+After this, create or label a **VolumeSnapshotClass** for Longhorn (`velero.io/csi-volumesnapshot-class: "true"`) per `clusters/noble/bootstrap/velero/README.md`.
--- a/clusters/noble/bootstrap/csi-snapshot-controller/controller/kustomization.yaml
+++ b/clusters/noble/bootstrap/csi-snapshot-controller/controller/kustomization.yaml
@@ -0,0 +1,8 @@
+# Snapshot controller — **kube-system** (upstream default).
+# Image tag should match the external-snapshotter release family (see setup-snapshot-controller.yaml in that tag).
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+namespace: kube-system
+resources:
+  - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
+  - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
--- a/clusters/noble/bootstrap/csi-snapshot-controller/crd/kustomization.yaml
+++ b/clusters/noble/bootstrap/csi-snapshot-controller/crd/kustomization.yaml
@@ -0,0 +1,9 @@
+# kubernetes-csi/external-snapshotter — Volume Snapshot GA CRDs only (no VolumeGroupSnapshot).
+# Pin **ref** when bumping; keep in sync with **controller** image below.
+# https://github.com/kubernetes-csi/external-snapshotter/tree/v8.5.0/client/config/crd
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+resources:
+  - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
+  - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
+  - https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
--- a/clusters/noble/bootstrap/external-secrets/README.md
+++ b/clusters/noble/bootstrap/external-secrets/README.md
@@ -1,60 +0,0 @@
-# External Secrets Operator (noble)
-
-Syncs secrets from external systems into Kubernetes **Secret** objects via **ExternalSecret** / **ClusterExternalSecret** CRDs.
-
- **Chart:** `external-secrets/external-secrets` **2.2.0** (app **v2.2.0**)
- **Namespace:** `external-secrets`
- **Helm release name:** `external-secrets` (matches the operator **ServiceAccount** name `external-secrets`)
-
-## Install
-
-```bash
-helm repo add external-secrets https://charts.external-secrets.io
-helm repo update
-kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
-helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
-  --version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
-```
-
-Verify:
-
-```bash
-kubectl -n external-secrets get deploy,pods
-kubectl get crd | grep external-secrets
-```
-
-## Vault `ClusterSecretStore` (after Vault is deployed)
-
-The checklist expects a **Vault**-backed store. Install Vault first (`talos/CLUSTER-BUILD.md` Phase E — Vault on Longhorn + auto-unseal), then:
-
-1. Enable **KV v2** secrets engine and **Kubernetes** auth in Vault; create a **role** (e.g. `external-secrets`) that maps the cluster’s **`external-secrets` / `external-secrets`** service account to a policy that can read the paths you need.
-2. Copy **`examples/vault-cluster-secret-store.yaml`**, set **`spec.provider.vault.server`** to your Vault URL. This repo’s Vault Helm values use **HTTP** on port **8200** (`global.tlsDisable: true`): **`http://vault.vault.svc.cluster.local:8200`**. Use **`https://`** if you enable TLS on the Vault listener.
-3. If Vault uses a **private TLS CA**, configure **`caProvider`** or **`caBundle`** on the Vault provider — see [HashiCorp Vault provider](https://external-secrets.io/latest/provider/hashicorp-vault/). Do not commit private CA material to public git unless intended.
-4. Apply: **`kubectl apply -f …/vault-cluster-secret-store.yaml`**
-5. Confirm the store is ready: **`kubectl describe clustersecretstore vault`**
-
-Example **ExternalSecret** (after the store is healthy):
-
-```yaml
-apiVersion: external-secrets.io/v1
-kind: ExternalSecret
-metadata:
-  name: demo
-  namespace: default
-spec:
-  refreshInterval: 1h
-  secretStoreRef:
-    name: vault
-    kind: ClusterSecretStore
-  target:
-    name: demo-synced
-  data:
-    - secretKey: password
-      remoteRef:
-        key: secret/data/myapp
-        property: password
-```
-
-## Upgrades
-
-Pin the chart version in `values.yaml` header comments; run the same **`helm upgrade --install`** with the new **`--version`** after reviewing [release notes](https://github.com/external-secrets/external-secrets/releases).
--- a/clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
+++ b/clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
@@ -1,31 +0,0 @@
-# ClusterSecretStore for HashiCorp Vault (KV v2) using Kubernetes auth.
-#
-# Do not apply until Vault is running, reachable from the cluster, and configured with:
-# - Kubernetes auth at mountPath (default: kubernetes)
-# - A role (below: external-secrets) bound to this service account:
-#     name: external-secrets
-#     namespace: external-secrets
-# - A policy allowing read on the KV path used below (e.g. secret/data/* for path "secret")
-#
-# Adjust server, mountPath, role, and path to match your Vault deployment. If Vault uses TLS
-# with a private CA, set provider.vault.caProvider or caBundle (see README).
-#
-# kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
---
-apiVersion: external-secrets.io/v1
-kind: ClusterSecretStore
-metadata:
-  name: vault
-spec:
-  provider:
-    vault:
-      server: "http://vault.vault.svc.cluster.local:8200"
-      path: secret
-      version: v2
-      auth:
-        kubernetes:
-          mountPath: kubernetes
-          role: external-secrets
-          serviceAccountRef:
-            name: external-secrets
-            namespace: external-secrets
--- a/clusters/noble/bootstrap/external-secrets/namespace.yaml
+++ b/clusters/noble/bootstrap/external-secrets/namespace.yaml
@@ -1,5 +0,0 @@
-# External Secrets Operator — apply before Helm.
-apiVersion: v1
-kind: Namespace
-metadata:
-  name: external-secrets
--- a/clusters/noble/bootstrap/external-secrets/values.yaml
+++ b/clusters/noble/bootstrap/external-secrets/values.yaml
@@ -1,10 +0,0 @@
-# External Secrets Operator — noble
-#
-# helm repo add external-secrets https://charts.external-secrets.io
-# helm repo update
-# kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
-# helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
-#   --version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
-#
-# CRDs are installed by the chart (installCRDs: true). Vault ClusterSecretStore: see README + examples/.
-commonLabels: {}
--- a/clusters/noble/bootstrap/kustomization.yaml
+++ b/clusters/noble/bootstrap/kustomization.yaml
@@ -1,6 +1,8 @@
 # Ansible bootstrap: plain Kustomize (namespaces + extra YAML). Helm installs are driven by
 # **ansible/playbooks/noble.yml** (role **noble_platform**) — avoids **kustomize --enable-helm** in-repo.
-# Optional GitOps workloads live under **../apps/** (Argo **noble-root**).
+# Optional GitOps: **../apps/** (Argo **noble-root**); leaf **Application**s under **argocd/app-of-apps/**.
+# **noble-bootstrap-root** (Argo) uses this same path — enable automated sync only after **noble.yml**
+# completes (see **argocd/README.md** §5).
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization

@@ -8,11 +10,10 @@ resources:
  - kube-prometheus-stack/namespace.yaml
  - loki/namespace.yaml
  - fluent-bit/namespace.yaml
-  - sealed-secrets/namespace.yaml
-  - external-secrets/namespace.yaml
-  - vault/namespace.yaml
+  - newt/namespace.yaml
  - kyverno/namespace.yaml
+  - velero/namespace.yaml
+  - velero/longhorn-volumesnapshotclass.yaml
  - headlamp/namespace.yaml
  - grafana-loki-datasource/loki-datasource.yaml
-  - vault/unseal-cronjob.yaml
-  - vault/cilium-network-policy.yaml
+  - argocd/app-of-apps
--- a/clusters/noble/bootstrap/kyverno/policies-values.yaml
+++ b/clusters/noble/bootstrap/kyverno/policies-values.yaml
@@ -35,7 +35,6 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
          - kube-node-lease
          - argocd
          - cert-manager
-          - external-secrets
          - headlamp
          - kyverno
          - logging
@@ -44,9 +43,7 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
          - metallb-system
          - monitoring
          - newt
-          - sealed-secrets
          - traefik
-          - vault

 policyExclude:
  disallow-capabilities: *kyverno_exclude_infra
--- a/clusters/noble/bootstrap/newt/README.md
+++ b/clusters/noble/bootstrap/newt/README.md
@@ -2,26 +2,24 @@

 This is the **primary** automation path for **public** hostnames to workloads in this cluster (it **replaces** in-cluster ExternalDNS). [Newt](https://github.com/fosrl/newt) is the on-prem agent that connects your cluster to a **Pangolin** site (WireGuard tunnel). The [Fossorial Helm chart](https://github.com/fosrl/helm-charts) deploys one or more instances.

-**Secrets:** Never commit endpoint, Newt ID, or Newt secret. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret.
+**Secrets:** Never commit endpoint, Newt ID, or Newt secret in **plain** YAML. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret.

 ## 1. Create the Secret

 Keys must match `values.yaml` (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`).

-### Option A — Sealed Secret (safe for GitOps)
+### Option A — SOPS (safe for GitOps)

-With the [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) controller installed (`clusters/noble/bootstrap/sealed-secrets/`), generate a `SealedSecret` from your workstation (rotate credentials in Pangolin first if they were exposed):
+Encrypt a normal **`Secret`** with [Mozilla SOPS](https://github.com/getsops/sops) and **age** (see **`clusters/noble/secrets/README.md`** and **`.sops.yaml`**). The repo includes an encrypted example at **`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`** — edit with `sops` after exporting **`SOPS_AGE_KEY_FILE`** to your **`age-key.txt`**, or create a new file and encrypt it.

 ```bash
-chmod +x clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
-export PANGOLIN_ENDPOINT='https://pangolin.pcenicni.dev'
-export NEWT_ID='YOUR_NEWT_ID'
-export NEWT_SECRET='YOUR_NEWT_SECRET'
-./clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
-kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
+export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
+sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
+# then:
+sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
 ```

-Commit only the `.sealedsecret.yaml` file, not plain `Secret` YAML.
+**Ansible** (`noble.yml`) applies all **`clusters/noble/secrets/*.yaml`** automatically when **`age-key.txt`** exists at the repo root.

 ### Option B — Imperative Secret (not in git)

--- a/clusters/noble/bootstrap/sealed-secrets/README.md
+++ b/clusters/noble/bootstrap/sealed-secrets/README.md
@@ -1,50 +0,0 @@
-# Sealed Secrets (noble)
-
-Encrypts `Secret` manifests so they can live in git; the controller decrypts **SealedSecret** resources into **Secret**s in-cluster.
-
- **Chart:** `sealed-secrets/sealed-secrets` **2.18.4** (app **0.36.1**)
- **Namespace:** `sealed-secrets`
-
-## Install
-
-```bash
-helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
-helm repo update
-kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
-helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
-  --version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
-```
-
-## Workstation: `kubeseal`
-
-Install a **kubeseal** build compatible with the controller (match **app** minor, e.g. **0.36.x** for **0.36.1**). Examples:
-
- **Homebrew:** `brew install kubeseal` (check `kubeseal --version` against the chart’s `image.tag` in `helm show values`).
- **GitHub releases:** [bitnami-labs/sealed-secrets](https://github.com/bitnami-labs/sealed-secrets/releases)
-
-Fetch the cluster’s public seal cert (once per kube context):
-
-```bash
-kubeseal --fetch-cert > /tmp/noble-sealed-secrets.pem
-```
-
-Create a sealed secret from a normal secret manifest:
-
-```bash
-kubectl create secret generic example --from-literal=foo=bar --dry-run=client -o yaml \
-  | kubeseal --cert /tmp/noble-sealed-secrets.pem -o yaml > example-sealedsecret.yaml
-```
-
-Commit `example-sealedsecret.yaml`; apply it with `kubectl apply -f`. The controller creates the **Secret** in the same namespace as the **SealedSecret**.
-
-**Noble example:** `examples/kubeseal-newt-pangolin-auth.sh` (Newt / Pangolin tunnel credentials).
-
-## Backup the sealing key
-
-If the controller’s private key is lost, existing sealed files cannot be decrypted on a new cluster. Back up the key secret after install:
-
-```bash
-kubectl get secret -n sealed-secrets -l sealedsecrets.bitnami.com/sealed-secrets-key=active -o yaml > sealed-secrets-key-backup.yaml
-```
-
-Store `sealed-secrets-key-backup.yaml` in a safe offline location (not in public git).
--- a/clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
+++ b/clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
@@ -1,19 +0,0 @@
-#!/usr/bin/env bash
-# Emit a SealedSecret for newt-pangolin-auth (namespace newt).
-# Prerequisites: sealed-secrets controller running; kubeseal client (same minor as controller).
-# Rotate Pangolin/Newt credentials in the UI first if they were exposed, then set env vars and run:
-#
-#   export PANGOLIN_ENDPOINT='https://pangolin.example.com'
-#   export NEWT_ID='...'
-#   export NEWT_SECRET='...'
-#   ./kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
-#   kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
-#
-set -euo pipefail
-kubectl apply -f "$(dirname "$0")/../../newt/namespace.yaml" >/dev/null 2>&1 || true
-kubectl -n newt create secret generic newt-pangolin-auth \
-  --dry-run=client \
-  --from-literal=PANGOLIN_ENDPOINT="${PANGOLIN_ENDPOINT:?}" \
-  --from-literal=NEWT_ID="${NEWT_ID:?}" \
-  --from-literal=NEWT_SECRET="${NEWT_SECRET:?}" \
-  -o yaml | kubeseal -o yaml
--- a/clusters/noble/bootstrap/sealed-secrets/namespace.yaml
+++ b/clusters/noble/bootstrap/sealed-secrets/namespace.yaml
@@ -1,5 +0,0 @@
-# Sealed Secrets controller — apply before Helm.
-apiVersion: v1
-kind: Namespace
-metadata:
-  name: sealed-secrets
--- a/clusters/noble/bootstrap/sealed-secrets/values.yaml
+++ b/clusters/noble/bootstrap/sealed-secrets/values.yaml
@@ -1,18 +0,0 @@
-# Sealed Secrets — noble (Git-encrypted Secret workflow)
-#
-# helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
-# helm repo update
-# kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
-# helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
-#   --version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
-#
-# Client: install kubeseal (same minor as controller — see README).
-# Defaults are sufficient for the lab; override here if you need key renewal, resources, etc.
-#
-# GitOps pattern: create Secrets only via SealedSecret (or External Secrets + Vault).
-# Example (Newt): clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
-# Backup the controller's sealing key: kubectl -n sealed-secrets get secret sealed-secrets-key -o yaml
-#
-# Talos cluster secrets (bootstrap token, cluster secret, certs) belong in talhelper talsecret /
-# SOPS — not Sealed Secrets. See talos/README.md.
-commonLabels: {}
--- a/clusters/noble/bootstrap/vault/README.md
+++ b/clusters/noble/bootstrap/vault/README.md
@@ -1,162 +0,0 @@
-# HashiCorp Vault (noble)
-
-Standalone Vault with **file** storage on a **Longhorn** PVC (`server.dataStorage`). The listener uses **HTTP** (`global.tlsDisable: true`) for in-cluster use; add TLS at the listener when exposing outside the cluster.
-
- **Chart:** `hashicorp/vault` **0.32.0** (Vault **1.21.2**)
- **Namespace:** `vault`
-
-## Install
-
-```bash
-helm repo add hashicorp https://helm.releases.hashicorp.com
-helm repo update
-kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
-helm upgrade --install vault hashicorp/vault -n vault \
-  --version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
-```
-
-Verify:
-
-```bash
-kubectl -n vault get pods,pvc,svc
-kubectl -n vault exec -i sts/vault -- vault status
-```
-
-## Cilium network policy (Phase G)
-
-After **Cilium** is up, optionally restrict HTTP access to the Vault server pods (**TCP 8200**) to **`external-secrets`** and same-namespace clients:
-
-```bash
-kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
-```
-
-If you add workloads in other namespaces that call Vault, extend **`ingress`** in that manifest.
-
-## Initialize and unseal (first time)
-
-From a workstation with `kubectl` (or `kubectl exec` into any pod with `vault` CLI):
-
-```bash
-kubectl -n vault exec -i sts/vault -- vault operator init -key-shares=1 -key-threshold=1
-```
-
-**Lab-only:** `-key-shares=1 -key-threshold=1` keeps a single unseal key. For stronger Shamir splits, use more shares and store them safely.
-
-Save the **Unseal Key** and **Root Token** offline. Then unseal once:
-
-```bash
-kubectl -n vault exec -i sts/vault -- vault operator unseal
-# paste unseal key
-```
-
-Or create the Secret used by the optional CronJob and apply it:
-
-```bash
-kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
-kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
-```
-
-The CronJob runs every minute and unseals if Vault is sealed and the Secret is present.
-
-## Auto-unseal note
-
-Vault **OSS** auto-unseal uses cloud KMS (AWS, GCP, Azure, OCI), **Transit** (another Vault), etc. There is no first-class “Kubernetes Secret” seal. This repo uses an optional **CronJob** as a **lab** substitute. Production clusters should use a supported seal backend.
-
-## Kubernetes auth (External Secrets / ClusterSecretStore)
-
-**One-shot:** from the repo root, `export KUBECONFIG=talos/kubeconfig` and `export VAULT_TOKEN=…`, then run **`./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`** (idempotent). Then **`kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** on its own line (shell comments **`# …`** on the same line are parsed as extra `kubectl` args and break `apply`). **`kubectl get clustersecretstore vault`** should show **READY=True** after a few seconds.
-
-Run these **from your workstation** (needs `kubectl`; no local `vault` binary required). Use a **short-lived admin token** or the root token **only in your shell** — do not paste tokens into logs or chat.
-
-**1. Enable the auth method** (skip if already done):
-
-```bash
-kubectl -n vault exec -it sts/vault -- sh -c '
-  export VAULT_ADDR=http://127.0.0.1:8200
-  export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
-  vault auth enable kubernetes
-'
-```
-
-**2. Configure `auth/kubernetes`** — the API **issuer** must match the `iss` claim on service account JWTs. With **kube-vip** / a custom API URL, discover it from the cluster (do not assume `kubernetes.default`):
-
-```bash
-ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
-REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
-CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
-```
-
-Then apply config **inside** the Vault pod (environment variables are passed in with `env` so quoting stays correct):
-
-```bash
-export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
-export ISSUER REVIEWER CA_B64
-kubectl -n vault exec -i sts/vault -- env \
-  VAULT_ADDR=http://127.0.0.1:8200 \
-  VAULT_TOKEN="$VAULT_TOKEN" \
-  CA_B64="$CA_B64" \
-  REVIEWER="$REVIEWER" \
-  ISSUER="$ISSUER" \
-  sh -ec '
-  echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
-  vault write auth/kubernetes/config \
-    kubernetes_host="https://kubernetes.default.svc:443" \
-    kubernetes_ca_cert=@/tmp/k8s-ca.crt \
-    token_reviewer_jwt="$REVIEWER" \
-    issuer="$ISSUER"
-'
-```
-
-**3. KV v2** at path `secret` (skip if already enabled):
-
-```bash
-kubectl -n vault exec -it sts/vault -- sh -c '
-  export VAULT_ADDR=http://127.0.0.1:8200
-  export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
-  vault secrets enable -path=secret kv-v2
-'
-```
-
-**4. Policy + role** for the External Secrets operator SA (`external-secrets` / `external-secrets`):
-
-```bash
-kubectl -n vault exec -it sts/vault -- sh -c '
-  export VAULT_ADDR=http://127.0.0.1:8200
-  export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
-  vault policy write external-secrets - <<EOF
-path "secret/data/*" {
-  capabilities = ["read", "list"]
-}
-path "secret/metadata/*" {
-  capabilities = ["read", "list"]
-}
-EOF
-  vault write auth/kubernetes/role/external-secrets \
-    bound_service_account_names=external-secrets \
-    bound_service_account_namespaces=external-secrets \
-    policies=external-secrets \
-    ttl=24h
-'
-```
-
-**5. Apply** **`clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** if you have not already, then verify:
-
-```bash
-kubectl describe clustersecretstore vault
-```
-
-See also [Kubernetes auth](https://developer.hashicorp.com/vault/docs/auth/kubernetes#configuration).
-
-## TLS and External Secrets
-
-`values.yaml` disables TLS on the Vault listener. The **`ClusterSecretStore`** example uses **`http://vault.vault.svc.cluster.local:8200`**. If you enable TLS on the listener, switch the URL to **`https://`** and configure **`caBundle`** or **`caProvider`** on the store.
-
-## UI
-
-Port-forward:
-
-```bash
-kubectl -n vault port-forward svc/vault-ui 8200:8200
-```
-
-Open `http://127.0.0.1:8200` and log in with the root token (rotate for production workflows).
--- a/clusters/noble/bootstrap/vault/cilium-network-policy.yaml
+++ b/clusters/noble/bootstrap/vault/cilium-network-policy.yaml
@@ -1,40 +0,0 @@
-# CiliumNetworkPolicy — restrict who may reach Vault HTTP listener (8200).
-# Apply after Cilium is healthy: kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
-#
-# Ingress-only policy: egress from Vault is unchanged (Kubernetes auth needs API + DNS).
-# Extend ingress rules if other namespaces must call Vault (e.g. app workloads).
-#
-# Ref: https://docs.cilium.io/en/stable/security/policy/language/
---
-apiVersion: cilium.io/v2
-kind: CiliumNetworkPolicy
-metadata:
-  name: vault-http-ingress
-  namespace: vault
-spec:
-  endpointSelector:
-    matchLabels:
-      app.kubernetes.io/name: vault
-      component: server
-  ingress:
-    - fromEndpoints:
-        - matchLabels:
-            "k8s:io.kubernetes.pod.namespace": external-secrets
-      toPorts:
-        - ports:
-            - port: "8200"
-              protocol: TCP
-    - fromEndpoints:
-        - matchLabels:
-            "k8s:io.kubernetes.pod.namespace": traefik
-      toPorts:
-        - ports:
-            - port: "8200"
-              protocol: TCP
-    - fromEndpoints:
-        - matchLabels:
-            "k8s:io.kubernetes.pod.namespace": vault
-      toPorts:
-        - ports:
-            - port: "8200"
-              protocol: TCP
--- a/clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
+++ b/clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
@@ -1,77 +0,0 @@
-#!/usr/bin/env bash
-# Configure Vault Kubernetes auth + KV v2 + policy/role for External Secrets Operator.
-# Requires: kubectl (cluster access), jq optional (openid issuer); Vault reachable via sts/vault.
-#
-# Usage (from repo root):
-#   export KUBECONFIG=talos/kubeconfig   # or your path
-#   export VAULT_TOKEN='…'               # root or admin token — never commit
-#   ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
-#
-# Then: kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
-# Verify: kubectl describe clustersecretstore vault
-
-set -euo pipefail
-
-: "${VAULT_TOKEN:?Set VAULT_TOKEN to your Vault root or admin token}"
-
-ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
-REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
-CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
-
-kubectl -n vault exec -i sts/vault -- env \
-  VAULT_ADDR=http://127.0.0.1:8200 \
-  VAULT_TOKEN="$VAULT_TOKEN" \
-  sh -ec '
-    set -e
-    vault auth list >/tmp/vauth.txt
-    grep -q "^kubernetes/" /tmp/vauth.txt || vault auth enable kubernetes
-  '
-
-kubectl -n vault exec -i sts/vault -- env \
-  VAULT_ADDR=http://127.0.0.1:8200 \
-  VAULT_TOKEN="$VAULT_TOKEN" \
-  CA_B64="$CA_B64" \
-  REVIEWER="$REVIEWER" \
-  ISSUER="$ISSUER" \
-  sh -ec '
-    echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
-    vault write auth/kubernetes/config \
-      kubernetes_host="https://kubernetes.default.svc:443" \
-      kubernetes_ca_cert=@/tmp/k8s-ca.crt \
-      token_reviewer_jwt="$REVIEWER" \
-      issuer="$ISSUER"
-  '
-
-kubectl -n vault exec -i sts/vault -- env \
-  VAULT_ADDR=http://127.0.0.1:8200 \
-  VAULT_TOKEN="$VAULT_TOKEN" \
-  sh -ec '
-    set -e
-    vault secrets list >/tmp/vsec.txt
-    grep -q "^secret/" /tmp/vsec.txt || vault secrets enable -path=secret kv-v2
-  '
-
-kubectl -n vault exec -i sts/vault -- env \
-  VAULT_ADDR=http://127.0.0.1:8200 \
-  VAULT_TOKEN="$VAULT_TOKEN" \
-  sh -ec '
-    vault policy write external-secrets - <<EOF
-path "secret/data/*" {
-  capabilities = ["read", "list"]
-}
-path "secret/metadata/*" {
-  capabilities = ["read", "list"]
-}
-EOF
-    vault write auth/kubernetes/role/external-secrets \
-      bound_service_account_names=external-secrets \
-      bound_service_account_namespaces=external-secrets \
-      policies=external-secrets \
-      ttl=24h
-  '
-
-echo "Done. Issuer used: $ISSUER"
-echo ""
-echo "Next (each command on its own line — do not paste # comments after kubectl):"
-echo "  kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
-echo "  kubectl get clustersecretstore vault"
--- a/clusters/noble/bootstrap/vault/namespace.yaml
+++ b/clusters/noble/bootstrap/vault/namespace.yaml
@@ -1,5 +0,0 @@
-# HashiCorp Vault — apply before Helm.
-apiVersion: v1
-kind: Namespace
-metadata:
-  name: vault
--- a/clusters/noble/bootstrap/vault/unseal-cronjob.yaml
+++ b/clusters/noble/bootstrap/vault/unseal-cronjob.yaml
@@ -1,63 +0,0 @@
-# Optional lab auto-unseal: applies after Vault is initialized and Secret `vault-unseal-key` exists.
-#
-# 1) vault operator init -key-shares=1 -key-threshold=1  (lab only — single key)
-# 2) kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
-# 3) kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
-#
-# OSS Vault has no Kubernetes/KMS seal; this CronJob runs vault operator unseal when the server is sealed.
-# Protect the Secret with RBAC; prefer cloud KMS auto-unseal for real environments.
---
-apiVersion: batch/v1
-kind: CronJob
-metadata:
-  name: vault-auto-unseal
-  namespace: vault
-spec:
-  concurrencyPolicy: Forbid
-  successfulJobsHistoryLimit: 1
-  failedJobsHistoryLimit: 3
-  schedule: "*/1 * * * *"
-  jobTemplate:
-    spec:
-      template:
-        spec:
-          restartPolicy: OnFailure
-          securityContext:
-            runAsNonRoot: true
-            runAsUser: 100
-            runAsGroup: 1000
-            seccompProfile:
-              type: RuntimeDefault
-          containers:
-            - name: unseal
-              image: hashicorp/vault:1.21.2
-              imagePullPolicy: IfNotPresent
-              securityContext:
-                allowPrivilegeEscalation: false
-                capabilities:
-                  drop:
-                    - ALL
-              env:
-                - name: VAULT_ADDR
-                  value: http://vault.vault.svc:8200
-              command:
-                - /bin/sh
-                - -ec
-                - |
-                  test -f /secrets/key || exit 0
-                  status="$(vault status -format=json 2>/dev/null || true)"
-                  echo "$status" | grep -q '"initialized":true' || exit 0
-                  echo "$status" | grep -q '"sealed":false' && exit 0
-                  vault operator unseal "$(cat /secrets/key)"
-              volumeMounts:
-                - name: unseal
-                  mountPath: /secrets
-                  readOnly: true
-          volumes:
-            - name: unseal
-              secret:
-                secretName: vault-unseal-key
-                optional: true
-                items:
-                  - key: key
-                    path: key
--- a/clusters/noble/bootstrap/vault/values.yaml
+++ b/clusters/noble/bootstrap/vault/values.yaml
@@ -1,62 +0,0 @@
-# HashiCorp Vault — noble (standalone, file storage on Longhorn; TLS disabled on listener for in-cluster HTTP).
-#
-# helm repo add hashicorp https://helm.releases.hashicorp.com
-# helm repo update
-# kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
-# helm upgrade --install vault hashicorp/vault -n vault \
-#   --version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
-#
-# Post-install: initialize, store unseal key in Secret, apply optional unseal CronJob — see README.md
-#
-global:
-  tlsDisable: true
-
-injector:
-  enabled: true
-
-server:
-  enabled: true
-  dataStorage:
-    enabled: true
-    size: 10Gi
-    storageClass: longhorn
-    accessMode: ReadWriteOnce
-  ha:
-    enabled: false
-  standalone:
-    enabled: true
-    config: |
-      ui = true
-
-      listener "tcp" {
-        tls_disable = 1
-        address = "[::]:8200"
-        cluster_address = "[::]:8201"
-      }
-
-      storage "file" {
-        path = "/vault/data"
-      }
-
-  # Allow pod Ready before init/unseal so Helm --wait succeeds (see Vault /v1/sys/health docs).
-  readinessProbe:
-    enabled: true
-    path: "/v1/sys/health?uninitcode=204&sealedcode=204&standbyok=true"
-    port: 8200
-
-  # LAN: TLS terminates at Traefik + cert-manager; listener stays HTTP (global.tlsDisable).
-  ingress:
-    enabled: true
-    ingressClassName: traefik
-    annotations:
-      cert-manager.io/cluster-issuer: letsencrypt-prod
-    hosts:
-      - host: vault.apps.noble.lab.pcenicni.dev
-        paths: []
-    tls:
-      - secretName: vault-apps-noble-tls
-        hosts:
-          - vault.apps.noble.lab.pcenicni.dev
-
-ui:
-  enabled: true
--- a/clusters/noble/bootstrap/velero/README.md
+++ b/clusters/noble/bootstrap/velero/README.md
@@ -0,0 +1,118 @@
+# Velero (cluster backups)
+
+Ansible-managed core stack — **not** reconciled by Argo CD (`clusters/noble/apps` is optional GitOps only).
+
+## What you get
+
+- **No web UI** — Velero is operated with the **`velero`** CLI and **`kubectl`** (Backup, Schedule, Restore CRDs). Metrics are exposed for Prometheus; there is no first-party dashboard in this chart.
+- **vmware-tanzu/velero** Helm chart (**12.0.0** → Velero **1.18.0**) in namespace **`velero`**
+- **AWS plugin** init container for **S3-compatible** object storage (`velero/velero-plugin-for-aws:v1.14.0`)
+- **CSI snapshots** via Velero’s built-in CSI support (`EnableCSI`) and **VolumeSnapshotLocation** `velero.io/csi` (no separate CSI plugin image for Velero ≥ 1.14)
+- **Prometheus** scraping: **ServiceMonitor** labeled for **kube-prometheus** (`release: kube-prometheus`)
+- **Schedule** **`velero-daily-noble`**: cron **`0 3 * * *`** (daily at 03:00 in the Velero pod’s timezone, usually **UTC**), **720h** TTL per backup (~30 days). Edit **`values.yaml`** `schedules` to change time or retention.
+
+## Prerequisites
+
+1. **Volume Snapshot APIs** installed cluster-wide — **`clusters/noble/bootstrap/csi-snapshot-controller/`** (Ansible **`noble_csi_snapshot_controller`**, after **Cilium**). Without **`snapshot.storage.k8s.io`** CRDs and **`kube-system/snapshot-controller`**, Velero logs errors like `no matches for kind "VolumeSnapshot"`.
+2. **Longhorn** (or another CSI driver) with a **VolumeSnapshotClass** for that driver.
+3. For **Longhorn**, this repo applies **`velero/longhorn-volumesnapshotclass.yaml`** (`VolumeSnapshotClass` **`longhorn-velero`**, driver **`driver.longhorn.io`**, Velero label). It is included in **`clusters/noble/bootstrap/kustomization.yaml`** (same apply as other bootstrap YAML). For non-Longhorn drivers, add a class with **`velero.io/csi-volumesnapshot-class: "true"`** (see [Velero CSI](https://velero.io/docs/main/csi/)).
+
+4. **S3-compatible** endpoint (MinIO, VersityGW, AWS, etc.) and a **bucket**.
+
+## Credentials Secret
+
+Velero expects **`velero/velero-cloud-credentials`**, key **`cloud`**, in **INI** form for the AWS plugin:
+
+```ini
+[default]
+aws_access_key_id=<key>
+aws_secret_access_key=<secret>
+```
+
+Create manually:
+
+```bash
+kubectl -n velero create secret generic velero-cloud-credentials \
+  --from-literal=cloud="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' "$KEY" "$SECRET")"
+```
+
+Or let **Ansible** create it from **`.env`** (`NOBLE_VELERO_AWS_ACCESS_KEY_ID`, `NOBLE_VELERO_AWS_SECRET_ACCESS_KEY`) or from extra-vars **`noble_velero_aws_access_key_id`** / **`noble_velero_aws_secret_access_key`**.
+
+## Apply (Ansible)
+
+1. Copy **`.env.sample`** → **`.env`** at the **repository root** and set at least:
+   - **`NOBLE_VELERO_S3_BUCKET`** — object bucket name
+   - **`NOBLE_VELERO_S3_URL`** — S3 API base URL (e.g. `https://minio.lan:9000` or your VersityGW/MinIO endpoint)
+   - **`NOBLE_VELERO_AWS_ACCESS_KEY_ID`** / **`NOBLE_VELERO_AWS_SECRET_ACCESS_KEY`** — credentials the AWS plugin uses (S3-compatible access key style)
+
+2. Enable the role: set **`noble_velero_install: true`** in **`ansible/group_vars/all.yml`**, **or** pass **`-e noble_velero_install=true`** on the command line.
+
+3. Run from **`ansible/`** (adjust **`KUBECONFIG`** to your cluster admin kubeconfig):
+
+```bash
+cd ansible
+export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
+
+# Velero only (after helm repos; skips other roles unless their tags match — use full playbook if unsure)
+ansible-playbook playbooks/noble.yml --tags repos,velero -e noble_velero_install=true
+```
+
+If **`NOBLE_VELERO_S3_BUCKET`** / **`NOBLE_VELERO_S3_URL`** are not in **`.env`**, pass them explicitly:
+
+```bash
+ansible-playbook playbooks/noble.yml --tags repos,velero -e noble_velero_install=true \
+  -e noble_velero_s3_bucket=my-bucket \
+  -e noble_velero_s3_url=https://s3.example.com:9000
+```
+
+Full platform run (includes Velero when **`noble_velero_install`** is true in **`group_vars`**):
+
+```bash
+ansible-playbook playbooks/noble.yml
+```
+
+## Install (Ansible) — details
+
+1. Set **`noble_velero_install: true`** in **`ansible/group_vars/all.yml`** (or pass **`-e noble_velero_install=true`**).
+2. Set **`noble_velero_s3_bucket`** and **`noble_velero_s3_url`** via **`.env`** (**`NOBLE_VELERO_S3_*`**) or **`group_vars`** or **`-e`**. Extra-vars override **`.env`**. Optional: **`noble_velero_s3_region`**, **`noble_velero_s3_prefix`**, **`noble_velero_s3_force_path_style`** (defaults match `values.yaml`).
+3. Run **`ansible/playbooks/noble.yml`** (Velero runs after **`noble_platform`**).
+
+Example without **`.env`** (all on the CLI):
+
+```bash
+cd ansible
+ansible-playbook playbooks/noble.yml --tags velero \
+  -e noble_velero_install=true \
+  -e noble_velero_s3_bucket=noble-velero \
+  -e noble_velero_s3_url=https://minio.lan:9000 \
+  -e noble_velero_aws_access_key_id="$KEY" \
+  -e noble_velero_aws_secret_access_key="$SECRET"
+```
+
+The **`clusters/noble/bootstrap/kustomization.yaml`** applies **`velero/namespace.yaml`** with the rest of the bootstrap namespaces (so **`velero`** exists before Helm).
+
+## Install (Helm only)
+
+From repo root:
+
+```bash
+kubectl apply -f clusters/noble/bootstrap/velero/namespace.yaml
+# Create velero-cloud-credentials (see above), then:
+helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts && helm repo update
+helm upgrade --install velero vmware-tanzu/velero -n velero --version 12.0.0 \
+  -f clusters/noble/bootstrap/velero/values.yaml \
+  --set-string configuration.backupStorageLocation[0].bucket=YOUR_BUCKET \
+  --set-string configuration.backupStorageLocation[0].config.s3Url=https://YOUR-S3-ENDPOINT \
+  --wait
+```
+
+Edit **`values.yaml`** defaults (bucket placeholder, `s3Url`) or override with **`--set-string`** as above.
+
+## Quick checks
+
+```bash
+kubectl -n velero get pods,backupstoragelocation,volumesnapshotlocation
+velero backup create test --wait
+```
+
+(`velero` CLI: install from [Velero releases](https://github.com/vmware-tanzu/velero/releases).)
--- a/clusters/noble/bootstrap/velero/longhorn-volumesnapshotclass.yaml
+++ b/clusters/noble/bootstrap/velero/longhorn-volumesnapshotclass.yaml
@@ -0,0 +1,11 @@
+# Default Longhorn VolumeSnapshotClass for Velero CSI — one class per driver may carry
+# **velero.io/csi-volumesnapshot-class: "true"** (see velero/README.md).
+# Apply after **Longhorn** CSI is running (`driver.longhorn.io`).
+apiVersion: snapshot.storage.k8s.io/v1
+kind: VolumeSnapshotClass
+metadata:
+  name: longhorn-velero
+  labels:
+    velero.io/csi-volumesnapshot-class: "true"
+driver: driver.longhorn.io
+deletionPolicy: Delete
--- a/clusters/noble/bootstrap/velero/namespace.yaml
+++ b/clusters/noble/bootstrap/velero/namespace.yaml
@@ -0,0 +1,5 @@
+# Velero — apply before Helm (Ansible **noble_velero**).
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: velero
--- a/clusters/noble/bootstrap/velero/values.yaml
+++ b/clusters/noble/bootstrap/velero/values.yaml
@@ -0,0 +1,65 @@
+# Velero Helm values — vmware-tanzu/velero chart (see CLUSTER-BUILD.md Phase F).
+# Install: **ansible/playbooks/noble.yml** role **noble_velero** (override S3 settings via **noble_velero_*** vars).
+# Requires Secret **velero/velero-cloud-credentials** key **cloud** (INI for AWS plugin — see README).
+#
+# Chart: vmware-tanzu/velero — pin version on install (e.g. 12.0.0 / Velero 1.18.0).
+#   helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts && helm repo update
+#   kubectl apply -f clusters/noble/bootstrap/velero/namespace.yaml
+#   helm upgrade --install velero vmware-tanzu/velero -n velero --version 12.0.0 -f clusters/noble/bootstrap/velero/values.yaml
+
+initContainers:
+  - name: velero-plugin-for-aws
+    image: velero/velero-plugin-for-aws:v1.14.0
+    imagePullPolicy: IfNotPresent
+    volumeMounts:
+      - mountPath: /target
+        name: plugins
+
+configuration:
+  features: EnableCSI
+  defaultBackupStorageLocation: default
+  defaultVolumeSnapshotLocations: velero.io/csi:default
+
+  backupStorageLocation:
+    - name: default
+      provider: aws
+      bucket: noble-velero
+      default: true
+      accessMode: ReadWrite
+      credential:
+        name: velero-cloud-credentials
+        key: cloud
+      config:
+        region: us-east-1
+        s3ForcePathStyle: "true"
+        s3Url: https://s3.CHANGE-ME.invalid
+
+  volumeSnapshotLocation:
+    - name: default
+      provider: velero.io/csi
+      config: {}
+
+credentials:
+  useSecret: true
+  existingSecret: velero-cloud-credentials
+
+snapshotsEnabled: true
+deployNodeAgent: false
+
+metrics:
+  enabled: true
+  serviceMonitor:
+    enabled: true
+    autodetect: true
+    additionalLabels:
+      release: kube-prometheus
+
+# Daily full-cluster backup at 03:00 — cron is evaluated in the Velero pod (typically **UTC**; set TZ on the
+# Deployment if you need local wall clock). See `helm upgrade --install` to apply.
+schedules:
+  daily-noble:
+    disabled: false
+    schedule: "0 3 * * *"
+    template:
+      ttl: 720h
+      storageLocation: default
--- a/clusters/noble/secrets/README.md
+++ b/clusters/noble/secrets/README.md
@@ -0,0 +1,38 @@
+# SOPS-encrypted cluster secrets (noble)
+
+Secrets that belong in git are stored here as **Mozilla SOPS** files encrypted with [age](https://github.com/FiloSottile/age). The matching **private** key lives in **`age-key.txt`** at the repository root (gitignored — create with `age-keygen -o age-key.txt` and add the public key to **`.sops.yaml`** if you rotate keys).
+
+**Migrating from an older cluster** that ran **Vault**, **Sealed Secrets**, or **External Secrets Operator:** uninstall those Helm releases (`helm uninstall vault -n vault`, etc.), delete their namespaces if empty, and export any secrets you still need into plain **`Secret`** YAML here, then encrypt with **`sops`** before committing.
+
+## Prerequisites
+
+- [sops](https://github.com/getsops/sops) and **age** on the machine that encrypts or applies secrets.
+
+## Edit or create a Secret
+
+```bash
+export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
+
+# Create a new file from a template, then encrypt:
+sops clusters/noble/secrets/example.secret.yaml
+
+# Or edit an existing encrypted file (opens decrypted in $EDITOR):
+sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
+```
+
+## Apply to the cluster
+
+```bash
+export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
+export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
+
+sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
+```
+
+**Ansible** (`noble.yml`) runs the same decrypt-and-apply step for every `*.yaml` in this directory when **`age-key.txt`** exists and **`noble_apply_sops_secrets`** is true (see `ansible/group_vars/all.yml`).
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| `newt-pangolin-auth.secret.yaml` | Pangolin tunnel credentials for [Newt](../../bootstrap/newt/README.md) (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`). Replace placeholders and re-encrypt before relying on them. |
--- a/clusters/noble/secrets/newt-pangolin-auth.secret.yaml
+++ b/clusters/noble/secrets/newt-pangolin-auth.secret.yaml
@@ -0,0 +1,30 @@
+apiVersion: ENC[AES256_GCM,data:FaA=,iv:EsqIdZmNS4hfzwCZ0gL7Q5Czaz8Bii3jWFu60lKmgVo=,tag:tfr4yUuTiH4s+ufYW/dpCA==,type:str]
+kind: ENC[AES256_GCM,data:ozpTcG9F,iv:Q1EZ896Plhyz2qM4JJRnBf940kbVLSwyIIPUcDGBZFA=,tag:1bWEgI+I4Ni5J70MlohYdA==,type:str]
+metadata:
+    name: ENC[AES256_GCM,data:moXbGuT6ZOGhgVUBNcpHeLZQ,iv:1WDtxT/Et/6lxx1Mj93CQME8o0lhzxnBMkdSqP/n3R0=,tag:v+iqfE8tzCx8ZOMUW7OyEA==,type:str]
+    namespace: ENC[AES256_GCM,data:33/AMg==,iv:M0GvB/70nHh4MVR1saZy1pGY8IFFzkzGdJl4szHJbCI=,tag:0+1LX/EnkAP0FZ6ARKZNAA==,type:str]
+type: ENC[AES256_GCM,data:3io5utU1,iv:QqMDNL/R8SR7TC9mwDdDd3V6VOo+csgeiZCr2AdOZjw=,tag:/KSMy+vNz7Qj/I463eG0LQ==,type:str]
+stringData:
+    PANGOLIN_ENDPOINT: ENC[AES256_GCM,data:a/2QTnGYnNXGxNm8QSVTKC6I+r88J1m1CdMmTA==,iv:L2LvLD7IRX8wdAzALAWQ2ojB2OjWDIcVKrdi/lSvZFY=,tag:ALffRF9bncxA8CExSaRmHA==,type:str]
+    NEWT_ID: ENC[AES256_GCM,data:Xfe8QvBdX62CciYXYwMfJAzIE/0=,iv:tA+FJ93tsjJ29L3bSxNAEooiKPMc+5pa00EpQ2cJkho=,tag:auiR/zQjnsmyllXbSJf3KA==,type:str]
+    NEWT_SECRET: ENC[AES256_GCM,data:XY8XZOkZ+GpnjljbvtaH2oGJpDoZ47fN,iv:+J5sb7saqbVwHEyemx3CUSsdKArubRdPCLGbT09sFLM=,tag:zUowv8I1CaWZH+KLYOwKYw==,type:str]
+sops:
+    kms: []
+    gcp_kms: []
+    azure_kv: []
+    hc_vault: []
+    age:
+        - recipient: age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn
+          enc: |
+            -----BEGIN AGE ENCRYPTED FILE-----
+            YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB0RWppdWxZUEYzc2I2TURi
+            dm1pUzVaNDA4YldsWkFJODl1MWZ6MXFxWnhjCnVtU1VEQnJqbTI5M0hWM2FCaVlS
+            aXprTm42bTlldUVHMmxpUUJiWEVhcXcKLS0tIGNLVnNtNDdMQ0VVeDV1N29nOW9F
+            clhLa2tPdWtRMWYzc2YrR0hSQXczTlUK6hYj4HxQvu6Kqn/Ki+cYv9x5nvolyGqQ
+            N4g9z+t6orT6MYseWPf0uyovC/5iOOC6z/2exVe7/0rYo7ZOFm6dYQ==
+            -----END AGE ENCRYPTED FILE-----
+    lastmodified: "2026-03-29T23:37:33Z"
+    mac: ENC[AES256_GCM,data:uKtdqJhwE4HLCenHH+RG8O2yfVIcGbiXznL9ouAXhDLnQh/ksgeczr2fyyn9hs/JhCozAqRrF8vnYZsIdfG1DQfHjXn6Ro6gzYC0YR+gvFU8Mz9uPdVX3HYjUrzKJ5GhhBami0USZtLdGKOGgFDYmFoDsD/PmMXLUol8qJdW8Uk=,iv:rIfQI17+3vNBB1n//D7Wnl/SLWFjV0pgZDteumlS2f8=,tag:xibCfJdZQS+aB75drmY1VA==,type:str]
+    pgp: []
+    unencrypted_suffix: _unencrypted
+    version: 3.9.3
--- a/clusters/noble/wip/eclipse-che/README.md
+++ b/clusters/noble/wip/eclipse-che/README.md
@@ -0,0 +1,29 @@
+# Eclipse Che (optional — Argo CD)
+
+Three **Application** resources (sync waves **0 → 1 → 2**):
+
+| Wave | Application | Purpose |
+|------|-------------|---------|
+| 0 | `eclipse-che-devworkspace` | [DevWorkspace operator](https://github.com/devfile/devworkspace-operator) **v0.33.0** (`devworkspace/kustomization.yaml` → remote `combined.yaml`) |
+| 1 | `eclipse-che-operator` | [Eclipse Che Helm chart](https://artifacthub.io/packages/helm/eclipse-che/eclipse-che) **7.116.0** (operator in **`eclipse-che`**) |
+| 2 | `eclipse-che-cluster` | **`CheCluster`** (`checluster.yaml`) — Traefik + **cert-manager** TLS |
+
+**Prerequisites (cluster):** **cert-manager** + **Traefik** (noble bootstrap). **DNS:** `che.apps.noble.lab.pcenicni.dev` → Traefik LB (edit **`checluster.yaml`** if your domain differs).
+
+**First sync:** Wave ordering applies to **Application** CRs under **noble-root**; if the operator starts before DevWorkspace is ready, **Refresh**/**Sync** the child apps once. See [Eclipse Che on Kubernetes](https://eclipse.dev/che/docs/stable/administration-guide/installing-che-on-kubernetes/).
+
+**URL:** `kubectl get checluster eclipse-che -n eclipse-che -o jsonpath='{.status.cheURL}{"\n"}'` after **Phase** is **Active`.
+
+## Troubleshooting — “no available server” (or similar)
+
+**1. Eclipse Che / dashboard**
+
+- **DevWorkspace routing:** On Kubernetes you **must** set **`routing.clusterHostSuffix`** in **`DevWorkspaceOperatorConfig`** `devworkspace-operator-config` (`devworkspace/dwoc.yaml`). If it was missing, sync **`eclipse-che-devworkspace`** again, then **`eclipse-che-operator`** / **`eclipse-che-cluster`**.
+- **Status:** `kubectl get checluster eclipse-che -n eclipse-che -o jsonpath='{.status.chePhase}{"\n"}'` → expect **`Active`**.
+- **Pods:** `kubectl get pods -n eclipse-che` — wait for **Running** (Keycloak / gateway / server can take many minutes).
+- **Ingress + DNS:** `kubectl get ingress -n eclipse-che` — host **`che.apps.noble.lab.pcenicni.dev`** must resolve to your Traefik LB (same as Grafana/Homepage).
+- **TLS:** `kubectl describe certificate -n eclipse-che` (if present) — Let’s Encrypt must succeed before the browser trusts the URL.
+
+**2. Argo CD UI / repo**
+
+If the message appears in **Argo CD** (not Che), check in-cluster components: `kubectl get pods -n argocd`, `kubectl logs -n argocd deploy/argocd-repo-server --tail=80`, and that **Applications** use `destination.server: https://kubernetes.default.svc` (in-cluster), not a missing external API.
--- a/clusters/noble/wip/eclipse-che/application-checluster.yaml
+++ b/clusters/noble/wip/eclipse-che/application-checluster.yaml
@@ -0,0 +1,27 @@
+# CheCluster CR — sync wave 2 (operator must be Ready to reconcile).
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: eclipse-che-cluster
+  namespace: argocd
+  annotations:
+    argocd.argoproj.io/sync-wave: "2"
+  finalizers:
+    - resources-finalizer.argocd.argoproj.io/background
+spec:
+  project: default
+  source:
+    repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
+    targetRevision: HEAD
+    path: clusters/noble/apps/eclipse-che
+    directory:
+      include: checluster.yaml
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: eclipse-che
+  syncPolicy:
+    automated:
+      prune: true
+      selfHeal: true
+    syncOptions:
+      - ServerSideApply=true
--- a/clusters/noble/wip/eclipse-che/application-devworkspace.yaml
+++ b/clusters/noble/wip/eclipse-che/application-devworkspace.yaml
@@ -0,0 +1,26 @@
+# DevWorkspace operator — must sync before Eclipse Che Helm (sync wave 0).
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: eclipse-che-devworkspace
+  namespace: argocd
+  annotations:
+    argocd.argoproj.io/sync-wave: "0"
+  finalizers:
+    - resources-finalizer.argocd.argoproj.io/background
+spec:
+  project: default
+  source:
+    repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
+    targetRevision: HEAD
+    path: clusters/noble/apps/eclipse-che/devworkspace
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: devworkspace-controller
+  syncPolicy:
+    automated:
+      prune: true
+      selfHeal: true
+    syncOptions:
+      - CreateNamespace=true
+      - ServerSideApply=true
--- a/clusters/noble/wip/eclipse-che/application-operator.yaml
+++ b/clusters/noble/wip/eclipse-che/application-operator.yaml
@@ -0,0 +1,28 @@
+# Eclipse Che operator (Helm) — sync wave 1 (after DevWorkspace CRDs/controller).
+apiVersion: argoproj.io/v1alpha1
+kind: Application
+metadata:
+  name: eclipse-che-operator
+  namespace: argocd
+  annotations:
+    argocd.argoproj.io/sync-wave: "1"
+  finalizers:
+    - resources-finalizer.argocd.argoproj.io/background
+spec:
+  project: default
+  source:
+    repoURL: https://eclipse-che.github.io/che-operator/charts
+    chart: eclipse-che
+    targetRevision: 7.116.0
+    helm:
+      releaseName: eclipse-che
+  destination:
+    server: https://kubernetes.default.svc
+    namespace: eclipse-che
+  syncPolicy:
+    automated:
+      prune: true
+      selfHeal: true
+    syncOptions:
+      - CreateNamespace=true
+      - ServerSideApply=true
--- a/clusters/noble/wip/eclipse-che/checluster.yaml
+++ b/clusters/noble/wip/eclipse-che/checluster.yaml
@@ -0,0 +1,24 @@
+# Eclipse Che instance — applied after **che-operator** is running (sync wave 2).
+# Edit **hostname** / **domain** if your ingress DNS differs from the noble lab pattern.
+#
+# **devEnvironments.networking.externalTLSConfig** — required with cert-manager for **workspace** subdomains.
+# Without it, Che creates secure workspace Ingresses with TLS hosts but **no secretName**, so cert-manager
+# never issues certs and the dashboard often shows **no available server** when opening a workspace.
+apiVersion: org.eclipse.che/v2
+kind: CheCluster
+metadata:
+  name: eclipse-che
+  namespace: eclipse-che
+spec:
+  devEnvironments:
+    networking:
+      externalTLSConfig:
+        enabled: true
+        annotations:
+          cert-manager.io/cluster-issuer: letsencrypt-prod
+  networking:
+    domain: apps.noble.lab.pcenicni.dev
+    hostname: che.apps.noble.lab.pcenicni.dev
+    ingressClassName: traefik
+    annotations:
+      cert-manager.io/cluster-issuer: letsencrypt-prod
--- a/clusters/noble/wip/eclipse-che/devworkspace/dwoc.yaml
+++ b/clusters/noble/wip/eclipse-che/devworkspace/dwoc.yaml
@@ -0,0 +1,14 @@
+# Required on **Kubernetes** (OpenShift discovers this automatically). See DevWorkspaceOperatorConfig CRD:
+# **routing.clusterHostSuffix** — hostname suffix for DevWorkspace routes. Without this, Che / workspaces
+# often fail with errors like **no available server** or broken routing.
+# Must be named **devworkspace-operator-config** in **devworkspace-controller**.
+# v1alpha1 uses a root-level **config** key (not spec.config); see combined.yaml CRD for devworkspaceoperatorconfigs.
+# Edit if your ingress base domain differs from the noble lab pattern.
+apiVersion: controller.devfile.io/v1alpha1
+kind: DevWorkspaceOperatorConfig
+metadata:
+  name: devworkspace-operator-config
+  namespace: devworkspace-controller
+config:
+  routing:
+    clusterHostSuffix: apps.noble.lab.pcenicni.dev
--- a/clusters/noble/wip/eclipse-che/devworkspace/kustomization.yaml
+++ b/clusters/noble/wip/eclipse-che/devworkspace/kustomization.yaml
@@ -0,0 +1,7 @@
+# DevWorkspace operator — prerequisite for Eclipse Che (pinned tag).
+# https://github.com/devfile/devworkspace-operator
+apiVersion: kustomize.config.k8s.io/v1beta1
+kind: Kustomization
+resources:
+  - https://raw.githubusercontent.com/devfile/devworkspace-operator/v0.33.0/deploy/deployment/kubernetes/combined.yaml
+  - dwoc.yaml
--- a/docs/Racks.md
+++ b/docs/Racks.md
@@ -0,0 +1,169 @@
+# Physical racks — Noble lab (10")
+
+This page is a **logical rack layout** for the **noble** Talos lab: **three 10" (half-width) racks**, how **rack units (U)** are used, and **Ethernet** paths on **`192.168.50.0/24`**. Node names and IPs match [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) and [`docs/architecture.md`](architecture.md).
+
+## Legend
+
+| Symbol | Meaning |
+|--------|---------|
+| `█` / filled cell | Equipment occupying that **1U** |
+| `░` | Reserved / future use |
+| `·` | Empty |
+| `━━` | Copper to LAN switch |
+
+**Rack unit numbering:** **U increases upward** (U1 = bottom of rack, like ANSI/EIA). **Slot** in the diagrams is **top → bottom** reading order for a quick visual scan.
+
+### Three racks at a glance
+
+Read **top → bottom** (first row = top of rack).
+
+| Primary (10") | Storage B (10") | Rack C (10") |
+|-----------------|-----------------|--------------|
+| Fiber ONT | Mac Mini | *empty* |
+| UniFi Fiber Gateway | NAS | *empty* |
+| Patch panel | JBOD | *empty* |
+| 2.5 GbE ×8 PoE switch | *empty* | *empty* |
+| Raspberry Pi cluster | *empty* | *empty* |
+| **helium** (Talos) | *empty* | *empty* |
+| **neon** (Talos) | *empty* | *empty* |
+| **argon** (Talos) | *empty* | *empty* |
+| **krypton** (Talos) | *empty* | *empty* |
+
+**Connectivity:** Primary rack gear shares **one L2** (`192.168.50.0/24`). Storage B and Rack C link the same way when cabled (e.g. **Ethernet** to the PoE switch, **VPN** or flat LAN per your design).
+
+---
+
+## Rack A — LAN aggregation (10" × 12U)
+
+Dedicated to **Layer-2 access** and cable home runs. All cluster nodes plug into this switch (or into a downstream switch that uplinks here).
+
+```
+  TOP OF RACK
+  ┌────────────────────────────────────────┐
+  │ Slot 1  ········· empty ·············· │  12U
+  │ Slot 2  ········· empty ·············· │  11U
+  │ Slot 3  ········· empty ·············· │  10U
+  │ Slot 4  ········· empty ·············· │   9U
+  │ Slot 5  ········· empty ·············· │   8U
+  │ Slot 6  ········· empty ·············· │   7U
+  │ Slot 7  ░░░░░░░ optional PDU ░░░░░░░░ │   6U
+  │ Slot 8  █████ 1U cable manager ██████ │   5U
+  │ Slot 9  █████ 1U patch panel █████████ │   4U
+  │ Slot10  ███ 8-port managed switch ████ │   3U  ← LAN L2 spine
+  │ Slot11  ········· empty ·············· │   2U
+  │ Slot12  ········· empty ·············· │   1U
+  └────────────────────────────────────────┘
+  BOTTOM
+```
+
+**Network role:** Every node NIC → **switch access port** → same **VLAN / flat LAN** as documented; **kube-vip** VIP **`192.168.50.230`**, **MetalLB** **`192.168.50.210`–`229`**, **Traefik** **`192.168.50.211`** are **logical** on node IPs (no extra hardware).
+
+---
+
+## Rack B — Control planes (10" × 12U)
+
+Three **Talos control-plane** nodes (**scheduling allowed** on CPs per `talconfig.yaml`).
+
+```
+  TOP OF RACK
+  ┌────────────────────────────────────────┐
+  │ Slot 1  ········· empty ·············· │  12U
+  │ Slot 2  ········· empty ·············· │  11U
+  │ Slot 3  ········· empty ·············· │  10U
+  │ Slot 4  ········· empty ·············· │   9U
+  │ Slot 5  ········· empty ·············· │   8U
+  │ Slot 6  ········· empty ·············· │   7U
+  │ Slot 7  ········· empty ·············· │   6U
+  │ Slot 8  █ neon  control-plane .20 ████ │   5U
+  │ Slot 9  █ argon control-plane .30 ███ │   4U
+  │ Slot10  █ krypton control-plane .40 ██ │   3U  (kube-vip VIP .230)
+  │ Slot11  ········· empty ·············· │   2U
+  │ Slot12  ········· empty ·············· │   1U
+  └────────────────────────────────────────┘
+  BOTTOM
+```
+
+---
+
+## Rack C — Worker (10" × 12U)
+
+Single **worker** node; **Longhorn** data disk is **local** to each node (see `talconfig.yaml`); no separate NAS in this diagram.
+
+```
+  TOP OF RACK
+  ┌────────────────────────────────────────┐
+  │ Slot 1  ········· empty ·············· │  12U
+  │ Slot 2  ········· empty ·············· │  11U
+  │ Slot 3  ········· empty ·············· │  10U
+  │ Slot 4  ········· empty ·············· │   9U
+  │ Slot 5  ········· empty ·············· │   8U
+  │ Slot 6  ········· empty ·············· │   7U
+  │ Slot 7  ░░░░░░░ spare / future ░░░░░░░░ │   6U
+  │ Slot 8  ········· empty ·············· │   5U
+  │ Slot 9  ········· empty ·············· │   4U
+  │ Slot10  ███ helium  worker  .10 █████ │   3U
+  │ Slot11  ········· empty ·············· │   2U
+  │ Slot12  ········· empty ·············· │   1U
+  └────────────────────────────────────────┘
+  BOTTOM
+```
+
+---
+
+## Space summary
+
+| System | Rack | Approx. U | IP | Role |
+|--------|------|-----------|-----|------|
+| LAN switch | A | 1U | — | All nodes on `192.168.50.0/24` |
+| Patch / cable mgmt | A | 2× 1U | — | Physical plant |
+| **neon** | B | 1U | `192.168.50.20` | control-plane + schedulable |
+| **argon** | B | 1U | `192.168.50.30` | control-plane + schedulable |
+| **krypton** | B | 1U | `192.168.50.40` | control-plane + schedulable |
+| **helium** | C | 1U | `192.168.50.10` | worker |
+
+Adjust **empty vs. future** rows if your chassis are **2U** or on **shelves** — scale the `█` blocks accordingly.
+
+---
+
+## Network connections
+
+All cluster nodes are on **one flat LAN**. **kube-vip** floats **`192.168.50.230:6443`** across the three control-plane hosts on **`ens18`** (see cluster bootstrap docs).
+
+```mermaid
+flowchart TB
+  subgraph RACK_A["Rack A — 10\""]
+    SW["Managed switch<br/>192.168.50.0/24 L2"]
+    PP["Patch / cable mgmt"]
+    SW --- PP
+  end
+  subgraph RACK_B["Rack B — 10\""]
+    N["neon :20"]
+    A["argon :30"]
+    K["krypton :40"]
+  end
+  subgraph RACK_C["Rack C — 10\""]
+    H["helium :10"]
+  end
+  subgraph LOGICAL["Logical (any node holding VIP)"]
+    VIP["API VIP 192.168.50.230<br/>kube-vip → apiserver :6443"]
+  end
+  WAN["Internet / other LANs"] -.->|"router (out of scope)"| SW
+  SW <-->|"Ethernet"| N
+  SW <-->|"Ethernet"| A
+  SW <-->|"Ethernet"| K
+  SW <-->|"Ethernet"| H
+  N --- VIP
+  A --- VIP
+  K --- VIP
+  WK["Workstation / CI<br/>kubectl, browser"] -->|"HTTPS :6443"| VIP
+  WK -->|"L2 (MetalLB .210–.211, any node)"| SW
+```
+
+**Ingress path (same LAN):** clients → **`192.168.50.211`** (Traefik) or **`192.168.50.210`** (Argo CD) via **MetalLB** — still **through the same switch** to whichever node advertises the service.
+
+---
+
+## Related docs
+
+- Cluster topology and services: [`architecture.md`](architecture.md)
+- Build state and versions: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -8,8 +8,8 @@ This document describes the **noble** Talos lab cluster: node topology, networki
 |---------------|---------|
 | **Subgraph “Cluster”** | Kubernetes cluster boundary (`noble`) |
 | **External / DNS / cloud** | Services outside the data plane (internet, registrar, Pangolin) |
-| **Data store** | Durable data (etcd, Longhorn, Loki, Vault storage) |
-| **Secrets / policy** | Secret material, Vault, admission policy |
+| **Data store** | Durable data (etcd, Longhorn, Loki) |
+| **Secrets / policy** | Secret material (SOPS in git), admission policy |
 | **LB / VIP** | Load balancer, MetalLB assignment, or API VIP |

 ---
@@ -74,7 +74,7 @@ flowchart TB

 ## Platform stack (bootstrap → workloads)

-Order: **Talos** → **Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip** → **Traefik**, **cert-manager** → **Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `vault`, `external-secrets`, `sealed-secrets`, `kyverno`, `newt`, and others as deployed.
+Order: **Talos** → **Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip** → **Traefik**, **cert-manager** → **Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `kyverno`, `newt`, and others as deployed.

 ```mermaid
 flowchart TB
@@ -98,7 +98,7 @@ flowchart TB
    Argo["Argo CD<br/>(optional app-of-apps; platform via Ansible)"]
  end
  subgraph L5["Platform namespaces (examples)"]
-    NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, vault, external-secrets, sealed-secrets,<br/>kyverno, newt, …"]
+    NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, kyverno, newt, …"]
  end
  Talos --> Cilium --> MS
  Cilium --> LH
@@ -149,22 +149,20 @@ flowchart LR

 ## Secrets and policy

-**Sealed Secrets** decrypts `SealedSecret` objects in-cluster. **External Secrets Operator** syncs from **Vault** using **`ClusterSecretStore`** (see [`examples/vault-cluster-secret-store.yaml`](../clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml)). Trust is **cluster → Vault** (ESO calls Vault; Vault does not initiate cluster trust). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
+**Mozilla SOPS** with **age** encrypts plain Kubernetes **`Secret`** manifests under [`clusters/noble/secrets/`](../clusters/noble/secrets/); operators decrypt at apply time (`ansible/playbooks/noble.yml` or `sops -d … | kubectl apply`). The private key is **`age-key.txt`** at the repo root (gitignored). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.

 ```mermaid
 flowchart LR
  subgraph Git["Git repo"]
-    SSman["SealedSecret manifests<br/>(optional)"]
+    SM["SOPS-encrypted Secret YAML<br/>clusters/noble/secrets/"]
+  end
+  subgraph ops["Apply path"]
+    SOPS["sops -d + kubectl apply<br/>(or Ansible noble.yml)"]
  end
  subgraph cluster["Cluster"]
-    SSC["Sealed Secrets controller<br/>sealed-secrets"]
-    ESO["External Secrets Operator<br/>external-secrets"]
-    V["Vault<br/>vault namespace<br/>HTTP listener"]
    K["Kyverno + kyverno-policies<br/>PSS baseline Audit"]
  end
-  SSman -->|"encrypted"| SSC -->|"decrypt to Secret"| workloads["Workload Secrets"]
-  ESO -->|"ClusterSecretStore →"| V
-  ESO -->|"sync ExternalSecret"| workloads
+  SM --> SOPS -->|"plain Secret"| workloads["Workload Secrets"]
  K -.->|"admission / audit<br/>(PSS baseline)"| workloads
 ```

@@ -172,7 +170,7 @@ flowchart LR

 ## Data and storage

-**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **Vault**, **kube-prometheus-stack** PVCs, and **Loki**.
+**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **kube-prometheus-stack** PVCs and **Loki**.

 ```mermaid
 flowchart TB
@@ -183,12 +181,10 @@ flowchart TB
    SC["StorageClass: longhorn (default)"]
  end
  subgraph consumers["Stateful / durable consumers"]
-    V["Vault PVC data-vault-0"]
    PGL["kube-prometheus-stack PVCs"]
    L["Loki PVC"]
  end
  UD --> SC
-  SC --> V
  SC --> PGL
  SC --> L
 ```
@@ -210,7 +206,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
 | Argo CD | 9.4.17 / app v3.3.6 |
 | kube-prometheus-stack | 82.15.1 |
 | Loki / Fluent Bit | 6.55.0 / 0.56.0 |
-| Sealed Secrets / ESO / Vault | 2.18.4 / 2.2.0 / 0.32.0 |
+| SOPS (client tooling) | see `clusters/noble/secrets/README.md` |
 | Kyverno | 3.7.1 / policies 3.7.1 |
 | Newt | 1.2.0 / app 1.10.1 |

@@ -218,7 +214,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative

 ## Narrative

-The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210`–`192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, Loki, and **Vault**. **Secrets** combine **Sealed Secrets** for git-encrypted material, **Vault** with **External Secrets** for dynamic sync, and **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
+The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210`–`192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible** for the **initial** platform install (**`clusters/noble/bootstrap/`**), then **Argo CD** for the kustomize tree (**`noble-bootstrap-root`** → **`clusters/noble/bootstrap`**) and optional apps (**`noble-root`** → **`clusters/noble/apps/`**) once automated sync is enabled after **`noble.yml`** (see **`clusters/noble/bootstrap/argocd/README.md`** §5). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, and Loki. **Secrets** in git use **SOPS** + **age** under **`clusters/noble/secrets/`**; **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.

 ---

@@ -233,7 +229,7 @@ The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** wi
 **Open questions**

 - **Split horizon:** Confirm whether only LAN DNS resolves `*.apps.noble.lab.pcenicni.dev` to **`192.168.50.211`** or whether public resolvers also point at that address.
- **Velero / S3:** **TBD** until an S3-compatible backend is configured.
+- **Velero / S3:** optional **Ansible** install (**`noble_velero_install`**) from **`clusters/noble/bootstrap/velero/`** once an S3-compatible backend and credentials exist (see **`talos/CLUSTER-BUILD.md`** Phase F).
 - **Argo CD:** Confirm **`repoURL`** in `root-application.yaml` and what is actually applied on-cluster.

 ---
--- a/docs/homelab-network.md
+++ b/docs/homelab-network.md
@@ -0,0 +1,100 @@
+# Homelab network inventory
+
+Single place for **VLANs**, **static addressing**, and **hosts** beside the **noble** Talos cluster. **Proxmox** is the **hypervisor** for the VMs below; **all of those VMs are intended to run on `192.168.1.0/24`** (same broadcast domain as Pi-hole and typical home clients). **Noble** (Talos) stays on **`192.168.50.0/24`** per [`architecture.md`](architecture.md) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) until you change that design.
+
+## VLANs (logical)
+
+| Network | Role |
+|---------|------|
+| **`192.168.1.0/24`** | **Homelab / Proxmox LAN** — **Proxmox host(s)**, **all Proxmox VMs**, **Pi-hole**, **Mac mini**, and other servers that share this VLAN. |
+| **`192.168.50.0/24`** | **Noble Talos** cluster — physical nodes, **kube-vip**, **MetalLB**, Traefik; **not** the Proxmox VM subnet. |
+| **`192.168.60.0/24`** | **DMZ / WAN-facing** — **NPM**, **WebDAV**, **other services** that need WAN access. |
+| **`192.168.40.0/24`** | **Home Assistant** and IoT devices — isolated; record subnet and HA IP in DHCP/router. |
+
+**Routing / DNS:** Clients and VMs on **`192.168.1.0/24`** reach **noble** services on **`192.168.50.0/24`** via **L3** (router/firewall). **NFS** from OMV (`192.168.1.105`) to **noble** pods uses the **OMV data IP** as the NFS server address from the cluster’s perspective.
+
+Firewall rules between VLANs are **out of scope** here; document them where you keep runbooks.
+
+---
+
+## `192.168.50.0/24` — reservations (noble only)
+
+Do not assign **unrelated** static services on **this** VLAN without checking overlap with MetalLB and kube-vip.
+
+| Use | Addresses |
+|-----|-----------|
+| Talos nodes | `.10`–`.40` (see [`talos/talconfig.yaml`](../talos/talconfig.yaml)) |
+| MetalLB L2 pool | `.210`–`.229` |
+| Traefik (ingress) | `.211` (typical) |
+| Argo CD | `.210` (typical) |
+| Kubernetes API (kube-vip) | **`.230`** — **must not** be a VM |
+
+---
+
+## Proxmox VMs (`192.168.1.0/24`)
+
+All run on **Proxmox**; addresses below use **`192.168.1.0/24`** (same host octet as your earlier `.50.x` / `.60.x` plan, moved into the homelab VLAN). Adjust if your router uses a different numbering scheme.
+
+Most are **Docker hosts** with multiple apps; treat the **IP** as the **host**, not individual containers.
+
+| VM ID | Name | IP | Notes |
+|-------|------|-----|--------|
+| 666 | nginxproxymanager | `192.168.1.20` | NPM (edge / WAN-facing role — firewall as you design). |
+| 777 | nginxproxymanager-Lan | `192.168.1.60` | NPM on **internal** homelab LAN. |
+| 100 | Openmediavault | `192.168.1.105` | **NFS** exports for *arr / media paths. |
+| 110 | Monitor | `192.168.1.110` | Uptime Kuma, Peekaping, Tracearr → cluster candidates. |
+| 120 | arr | `192.168.1.120` | *arr stack; media via **NFS** from OMV — see [migration](#arr-stack-nfs-and-kubernetes). |
+| 130 | Automate | `192.168.1.130` | Low use — **candidate to remove** or consolidate. |
+| 140 | general-purpose | `192.168.1.140` | IT tools, Mealie, Open WebUI, SparkyFitness, … |
+| 150 | Media-server | `192.168.1.150` | Jellyfin (test, **NFS** media), ebook server. |
+| 160 | s3 | `192.168.1.170` | Object storage; **merge** into **central S3** on noble per [`shared-data-services.md`](shared-data-services.md) when ready. |
+| 190 | Auth | `192.168.1.190` | **Authentik** → **noble (K8s)** for HA. |
+| 300 | gitea | `192.168.1.203` | On **`.1`**, no overlap with noble **MetalLB `.210`–`.229`** on **`.50`**. |
+| 310 | gitea-nsfw | `192.168.1.204` | |
+| 500 | AMP | `192.168.1.47` | |
+
+### Workload detail (what runs where)
+
+**Auth (190)** — **Authentik** is the main service; moving it to **Kubernetes (noble)** gives you **HA**, rolling upgrades, and backups via your cluster patterns (PVCs, Velero, etc.). Plan **OIDC redirect URLs** and **outposts** (if used) when the **ingress hostname** and paths to **`.50`** services change.
+
+**Monitor (110)** — **Uptime Kuma**, **Peekaping**, and **Tracearr** are a good fit for the cluster: small state (SQLite or small DBs), **Ingress** via Traefik, and **Longhorn** or a small DB PVC. Migrate **one app at a time** and keep the old VM until DNS and alerts are verified.
+
+**arr (120)** — **Lidarr, Sonarr, Radarr**, and related *arr* apps; libraries and download paths point at **NFS** from **Openmediavault (100)** at **`192.168.1.105`**. The hard part is **keeping paths, permissions (UID/GID), and download client** wiring while pods move.
+
+**Automate (130)** — Tools are **barely used**; **decommission**, merge into **general-purpose (140)**, or replace with a **CronJob** / one-shot on the cluster only if something still needs scheduling.
+
+**general-purpose (140)** — “Daily driver” stack: **IT tools**, **Mealie**, **Open WebUI**, **SparkyFitness**, and similar. **Candidates for gradual moves** to noble; group by **data sensitivity** and **persistence** (Postgres vs SQLite) when you pick order.
+
+**Media-server (150)** — **Jellyfin** (testing) with libraries on **NFS**; **ebook** server. Treat **Jellyfin** like *arr* for storage: same NFS export and **transcoding** needs (CPU on worker nodes or GPU if you add it). Ebook stack depends on what you run (e.g. Kavita, Audiobookshelf) — note **metadata paths** before moving.
+
+### Arr stack, NFS, and Kubernetes
+
+You do **not** have to move NFS into the cluster: **Openmediavault** on **`192.168.1.105`** can stay the **NFS server** while the *arr* apps run as **Deployments** with **ReadWriteMany** volumes. Noble nodes on **`192.168.50.0/24`** mount NFS using **that IP** (ensure **firewall** allows **NFS** from node IPs to OMV).
+
+1. **Keep OMV as the single source of exports** — same **export path** (e.g. `/export/media`) from the cluster’s perspective as from the current VM.
+2. **Mount NFS in Kubernetes** — use a **CSI NFS driver** (e.g. **nfs-subdir-external-provisioner** or **csi-driver-nfs**) so each app gets a **PVC** backed by a **subdirectory** of the export, **or** one shared RWX PVC for a common tree if your layout needs it.
+3. **Match POSIX ownership** — set **supplemental groups** or **fsGroup** / **runAsUser** on the pods so Sonarr/Radarr see the same **UID/GID** as today’s Docker setup; fix **squash** settings on OMV if you use `root_squash`.
+4. **Config and DB** — back up each app’s **config volume** (or SQLite files), redeploy with the same **environment**; point **download clients** and **NFS media roots** to the **same logical paths** inside the container.
+5. **Low-risk path** — run **one** *arr* app on the cluster while the rest stay on **VM 120** until imports and downloads behave; then cut DNS/NPM streams over.
+
+If you prefer **no** NFS from pods, the alternative is **large ReadWriteOnce** disks on Longhorn and **sync** from OMV — usually **more** moving parts than **RWX NFS** for this workload class.
+
+---
+
+## Other hosts
+
+| Host | IP | VLAN / network | Notes |
+|------|-----|----------------|--------|
+| **Pi-hole** | `192.168.1.127` | `192.168.1.0/24` | DNS; same VLAN as Proxmox VMs. |
+| **Home Assistant** | *TBD* | **IoT VLAN** | Add reservation when fixed. |
+| **Mac mini** | `192.168.1.155` | `192.168.1.0/24` | Align with **Storage B** in [`Racks.md`](Racks.md) if the same machine. |
+
+---
+
+## Related docs
+
+- **Shared Postgres + S3 (centralized):** [`shared-data-services.md`](shared-data-services.md)
+- **VM → noble migration plan:** [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
+- Noble cluster topology and ingress: [`architecture.md`](architecture.md)
+- Physical racks (Primary / Storage B / Rack C): [`Racks.md`](Racks.md)
+- Cluster checklist: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
--- a/docs/migration-vm-to-noble.md
+++ b/docs/migration-vm-to-noble.md
@@ -0,0 +1,121 @@
+# Migration plan: Proxmox VMs → noble (Kubernetes)
+
+This document is the **default playbook** for moving workloads from **Proxmox VMs** on **`192.168.1.0/24`** into the **noble** Talos cluster on **`192.168.50.0/24`**. Source inventory and per-VM notes: [`homelab-network.md`](homelab-network.md). Cluster facts: [`architecture.md`](architecture.md), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md).
+
+---
+
+## 1. Scope and principles
+
+| Principle | Detail |
+|-----------|--------|
+| **One service at a time** | Run the new workload on **noble** while the **VM** stays up; cut over **DNS / NPM** only after checks pass. |
+| **Same container image** | Prefer the **same** upstream image and major version as Docker on the VM to reduce surprises. |
+| **Data moves with a plan** | **Backup** VM volumes or export DB dumps **before** the first deploy to the cluster. |
+| **Ingress on noble** | Internal apps use **Traefik** + **`*.apps.noble.lab.pcenicni.dev`** (or your chosen hostnames) and **MetalLB** (e.g. **`192.168.50.211`**) per [`architecture.md`](architecture.md). |
+| **Cross-VLAN** | Clients on **`.1`** reach services on **`.50`** via **routing**; **firewall** must allow **NFS** from **Talos node IPs** to **OMV `192.168.1.105`** when pods mount NFS. |
+
+**Not everything must move.** Keep **Openmediavault** (and optionally **NPM**) on VMs if you prefer; the cluster consumes **NFS** and **HTTP** from them.
+
+---
+
+## 2. Prerequisites (before wave 1)
+
+1. **Cluster healthy** — `kubectl get nodes`; [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) checklist through ingress and cert-manager as needed.
+2. **Ingress + TLS** — **Traefik** + **cert-manager** working; you can hit a **test Ingress** on the MetalLB IP.
+3. **GitOps / deploy path** — Decide per app: **Helm** under `clusters/noble/apps/`, **Argo CD**, or **Ansible**-applied manifests (match how you manage the rest of noble).
+4. **Secrets** — Plan **Kubernetes Secrets**; for git-stored material, align with **SOPS** (`clusters/noble/secrets/`, `.sops.yaml`).
+5. **Storage** — **Longhorn** default for **ReadWriteOnce** state; for **NFS** (*arr*, Jellyfin), install a **CSI NFS** driver and test a **small RWX PVC** before migrating data-heavy apps.
+6. **Shared data tier (recommended)** — Deploy **centralized PostgreSQL** and **S3-compatible storage** on noble so apps do not each ship their own DB/object store; see [`shared-data-services.md`](shared-data-services.md).
+7. **Firewall** — Rules: **workstation → `192.168.50.230:6443`**; **nodes → OMV NFS ports**; **clients → `192.168.50.211`** (or split-horizon DNS) as you design.
+8. **DNS** — Split-horizon or Pi-hole records for **`*.apps.noble.lab.pcenicni.dev`** → **Traefik** IP **`192.168.50.211`** for LAN clients.
+
+---
+
+## 3. Standard migration procedure (repeat per app)
+
+Use this checklist for **each** application (or small group, e.g. one Helm release).
+
+| Step | Action |
+|------|--------|
+| **A. Discover** | Document **image:tag**, **ports**, **volumes** (host paths), **env vars**, **depends_on** (DB, Redis, NFS path). Export **docker inspect** / **compose** from the VM. |
+| **B. Backup** | Snapshot **Proxmox VM** or backup **volume** / **SQLite** / **DB dump** to offline storage. |
+| **C. Namespace** | Create a **dedicated namespace** (e.g. `monitoring-tools`, `authentik`) or use your house standard. |
+| **D. Deploy** | Add **Deployment** (or **StatefulSet**), **Service**, **Ingress** (class **traefik**), **PVCs**; wire **secrets** from **Secrets** (not literals in git). |
+| **E. Storage** | **Longhorn** PVC for local state; **NFS CSI** PVC for shared media/config paths that must match the VM (see [`homelab-network.md`](homelab-network.md) *arr* section). Prefer **shared Postgres** / **shared S3** per [`shared-data-services.md`](shared-data-services.md) instead of new embedded databases. Match **UID/GID** with `securityContext`. |
+| **F. Smoke test** | `kubectl port-forward` or temporary **Ingress** hostname; log in, run one critical workflow (login, playback, sync). |
+| **G. DNS cutover** | Point **internal DNS** or **NPM** upstream from the **VM IP** to the **new hostname** (Traefik) or **MetalLB IP** + Host header. |
+| **H. Observe** | 24–72 hours: logs, alerts, **Uptime Kuma** (once migrated), backups. |
+| **I. Decommission** | Stop the **container** on the VM (not the whole VM until the **whole** VM is empty). |
+| **J. VM off** | When **no** services remain on that VM, **power off** and archive or delete the VM. |
+
+**Rollback:** Re-enable the VM service, revert **DNS/NPM** to the old IP, delete or scale the cluster deployment to zero.
+
+---
+
+## 4. Recommended migration order (phases)
+
+Order balances **risk**, **dependencies**, and **learning curve**.
+
+| Phase | Target | Rationale |
+|-------|--------|-----------|
+| **0 — Optional** | **Automate (130)** | Low use: **retire** or replace with **CronJobs**; skip if nothing valuable runs. |
+| **0b — Platform** | **Shared Postgres + S3** on noble | Run **before** or alongside early waves so new deploys use **one DSN** and **one object endpoint**; retire **VM 160** when empty. See [`shared-data-services.md`](shared-data-services.md). |
+| **1 — Observability** | **Monitor (110)** — Uptime Kuma, Peekaping, Tracearr | Small state, validates **Ingress**, **PVCs**, and **alert paths** before auth and media. |
+| **2 — Git** | **gitea (300)**, **gitea-nsfw (310)** | Point at **shared Postgres** + **S3** for attachments; move **repos** with **PVC** + backup restore if needed. |
+| **3 — Object / misc** | **s3 (160)**, **AMP (500)** | **Migrate data** into **central** S3 on cluster, then **decommission** duplicate MinIO on VM **160** if applicable. |
+| **4 — Auth** | **Auth (190)** — **Authentik** | Use **shared Postgres**; update **all OIDC clients** (Gitea, apps, NPM) with **new issuer URLs**; schedule a **maintenance window**. |
+| **5 — Daily apps** | **general-purpose (140)** | Move **one app per release** (Mealie, Open WebUI, …); each app gets its **own database** (and bucket if needed) on the **shared** tiers — not a new Postgres pod per app. |
+| **6 — Media / *arr*** | **arr (120)**, **Media-server (150)** | **NFS** from **OMV**, download clients, **transcoding** — migrate **one *arr*** then Jellyfin/ebook; see NFS bullets in [`homelab-network.md`](homelab-network.md). |
+| **7 — Edge** | **NPM (666/777)** | Often **last**: either keep on Proxmox or replace with **Traefik** + **IngressRoutes** / **Gateway API**; many people keep a **dedicated** reverse proxy VM until parity is proven. |
+
+**Openmediavault (100)** — Typically **stays** as **NFS** (and maybe backup target) for the cluster; no need to “migrate” the whole NAS into Kubernetes.
+
+---
+
+## 5. Ingress and reverse proxy
+
+| Approach | When to use |
+|----------|-------------|
+| **Traefik Ingress** on noble | Default for **internal** HTTPS apps; **cert-manager** for public names you control. |
+| **NPM (VM)** as front door | Point **proxy host** → **Traefik MetalLB IP** or **service name** if you add internal DNS; reduces double-proxy if you **terminate TLS** in one place only. |
+| **Newt / Pangolin** | Public reachability per [`clusters/noble/bootstrap/newt/README.md`](../clusters/noble/bootstrap/newt/README.md); not automatic ExternalDNS. |
+
+Avoid **two** TLS terminations for the same hostname unless you intend **SSL passthrough** end-to-end.
+
+---
+
+## 6. Authentik-specific (Auth VM → cluster)
+
+1. **Backup** Authentik **PostgreSQL** (or embedded DB) and **media** volume from the VM.
+2. Deploy **Helm** (official chart) with **same** Authentik version if possible.
+3. **Restore** DB into **shared cluster Postgres** (recommended) or chart-managed DB — see [`shared-data-services.md`](shared-data-services.md).
+4. Update **issuer URL** in every **OIDC/OAuth** client (Gitea, Grafana, etc.).
+5. Re-test **outposts** (if any) and **redirect URIs** from both **`.1`** and **`.50`** client perspectives.
+6. **Cut over DNS**; then **decommission** VM **190**.
+
+---
+
+## 7. *arr* and Jellyfin-specific
+
+Follow the **numbered list** under **“Arr stack, NFS, and Kubernetes”** in [`homelab-network.md`](homelab-network.md). In short: **OMV stays**; **CSI NFS** + **RWX**; **match permissions**; migrate **one app** first; verify **download client** can reach the new pod **IP/DNS** from your download host.
+
+---
+
+## 8. Validation checklist (per wave)
+
+- Pods **Ready**, **Ingress** returns **200** / login page.
+- **TLS** valid for chosen hostname.
+- **Persistent data** present (new uploads, DB writes survive pod restart).
+- **Backups** (Velero or app-level) defined for the new location.
+- **Monitoring** / alerts updated (targets, not old VM IP).
+- **Documentation** in [`homelab-network.md`](homelab-network.md) updated (VM retired or marked migrated).
+
+---
+
+## Related docs
+
+- **Shared Postgres + S3:** [`shared-data-services.md`](shared-data-services.md)
+- VM inventory and NFS notes: [`homelab-network.md`](homelab-network.md)
+- Noble topology, MetalLB, Traefik: [`architecture.md`](architecture.md)
+- Bootstrap and versions: [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
+- Apps layout: [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md)
--- a/docs/shared-data-services.md
+++ b/docs/shared-data-services.md
@@ -0,0 +1,90 @@
+# Centralized PostgreSQL and S3-compatible storage
+
+Goal: **one shared PostgreSQL** and **one S3-compatible object store** on **noble**, instead of every app bundling its own database or MinIO. Apps keep **logical isolation** via **per-app databases** / **users** and **per-app buckets** (or prefixes), not separate clusters.
+
+See also: [`migration-vm-to-noble.md`](migration-vm-to-noble.md), [`homelab-network.md`](homelab-network.md) (VM **160** `s3` today), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) (Velero + S3).
+
+---
+
+## 1. Why centralize
+
+| Benefit | Detail |
+|--------|--------|
+| **Operations** | One backup/restore story, one upgrade cadence, one place to tune **IOPS** and **retention**. |
+| **Security** | **Least privilege**: each app gets its own **DB user** and **S3 credentials** scoped to one database or bucket. |
+| **Resources** | Fewer duplicate **Postgres** or **MinIO** sidecars; better use of **Longhorn** or dedicated PVCs for the shared tiers. |
+
+**Tradeoff:** Shared tiers are **blast-radius** targets — use **backups**, **PITR** where you care, and **NetworkPolicies** so only expected namespaces talk to Postgres/S3.
+
+---
+
+## 2. PostgreSQL — recommended pattern
+
+1. **Run Postgres on noble** — Operators such as **CloudNativePG**, **Zalando Postgres operator**, or a well-maintained **Helm** chart with **replicas** + **persistent volumes** (Longhorn).
+2. **One cluster instance, many databases** — For each app: `CREATE DATABASE appname;` and a **dedicated role** with `CONNECT` on that database only (not superuser).
+3. **Connection from apps** — Use a **Kubernetes Service** (e.g. `postgres-platform.platform.svc.cluster.local:5432`) and pass **credentials** via **Secrets** (ideally **SOPS**-encrypted in git).
+4. **Migrations** — Run app **migration** jobs or init containers against the **same** DSN after DB exists.
+
+**Migrating off SQLite / embedded Postgres**
+
+- **SQLite → Postgres:** export/import per app (native tools, or **pgloader** where appropriate).
+- **Docker Postgres volume:** `pg_dumpall` or per-DB `pg_dump` → restore into a **new** database on the shared server; **freeze writes** during cutover.
+
+---
+
+## 3. S3-compatible object storage — recommended pattern
+
+1. **Run one S3 API on noble** — **MinIO** (common), **Garage**, or **SeaweedFS** S3 layer — with **PVC(s)** or host path for data; **erasure coding** / replicas if the chart supports it and you want durability across nodes.
+2. **Buckets per concern** — e.g. `gitea-attachments`, `velero`, `loki-archive` — not one global bucket unless you enforce **prefix** IAM policies.
+3. **Credentials** — **IAM-style** users limited to **one bucket** (or prefix); **Secrets** reference **access key** / **secret**; never commit keys in plain text.
+4. **Endpoint for pods** — In-cluster: `http://minio.platform.svc.cluster.local:9000` (or TLS inside mesh). Apps use **virtual-hosted** or **path-style** per SDK defaults.
+
+### NFS as backing store for S3 on noble
+
+**Yes.** You can run MinIO (or another S3-compatible server) with its **data directory** on a **ReadWriteMany** volume that is **NFS** — for example the same **Openmediavault** export you already use, mounted via your **NFS CSI** driver (see [`homelab-network.md`](homelab-network.md)).
+
+| Consideration | Detail |
+|---------------|--------|
+| **Works for homelab** | MinIO stores objects as files under a path; **POSIX** on NFS is enough for many setups. |
+| **Performance** | NFS adds **latency** and shared bandwidth; fine for moderate use, less ideal for heavy multi-tenant throughput. |
+| **Availability** | The **NFS server** (OMV) becomes part of the availability story for object data — plan **backups** and **OMV** health like any dependency. |
+| **Locking / semantics** | Prefer **NFSv4.x**; avoid mixing **NFS** and expectations of **local SSD** (e.g. very chatty small writes). If you see odd behavior, **Longhorn** (block) on a node is the usual next step. |
+| **Layering** | You are stacking **S3 API → file layout → NFS → disk**; that is normal for a lab, just **monitor** space and exports on OMV. |
+
+**Summary:** NFS-backed PVC for MinIO is **valid** on noble; use **Longhorn** (or local disk) when you need **better IOPS** or want object data **inside** the cluster’s storage domain without depending on OMV for that tier.
+
+**Migrating off VM 160 (`s3`) or per-app MinIO**
+
+- **MinIO → MinIO:** `mc mirror` between aliases, or **replication** if you configure it.
+- **Same API:** Any tool speaking **S3** can **sync** buckets before you point apps at the new endpoint.
+
+**Velero** — Point the **backup location** at the **central** bucket (see cluster Velero docs); avoid a second ad-hoc object store for backups if one cluster bucket is enough.
+
+---
+
+## 4. Ordering relative to app migrations
+
+| When | What |
+|------|------|
+| **Early** | Stand up **Postgres** + **S3** with **empty** DBs/buckets; test with **one** non-critical app (e.g. a throwaway deployment). |
+| **Before auth / Git** | **Gitea** and **Authentik** benefit from **managed Postgres** early — plan **DSN** and **bucket** for attachments **before** cutover. |
+| **Ongoing** | New apps **must not** ship embedded **Postgres/MinIO** unless the workload truly requires it (e.g. vendor appliance). |
+
+---
+
+## 5. Checklist (platform team)
+
+- [ ] Postgres **Service** DNS name and **TLS** (optional in-cluster) documented.
+- [ ] S3 **endpoint**, **region** string (can be `us-east-1` for MinIO), **TLS** for Ingress if clients are outside the cluster.
+- [ ] **Backup:** scheduled **logical dumps** (Postgres) and **bucket replication** or **object versioning** where needed.
+- [ ] **SOPS** / **External Secrets** pattern for **rotation** without editing app manifests by hand.
+- [ ] **homelab-network.md** updated when **VM 160** is retired or repurposed.
+
+---
+
+## Related docs
+
+- VM → cluster migration: [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
+- Inventory (s3 VM): [`homelab-network.md`](homelab-network.md)
+- Longhorn / storage runbook: [`../talos/runbooks/longhorn.md`](../talos/runbooks/longhorn.md)
+- Velero (S3 backup target): [`../clusters/noble/bootstrap/velero/`](../clusters/noble/bootstrap/velero/) (if present)
--- a/komodo/monitor/tracearr/compose.yaml
+++ b/komodo/monitor/tracearr/compose.yaml
@@ -7,16 +7,17 @@

 services:
  tracearr:
-    image: ghcr.io/connorgallopo/tracearr:supervised-nightly
+    image: ghcr.io/connorgallopo/tracearr:supervised
    shm_size: 256mb  # Required for PostgreSQL shared memory
    ports:
      - "${PORT:-3000}:3000"
    environment:
+      - NODE_ENV=production
+      - PORT=3000
+      - HOST=0.0.0.0
      - TZ=${TZ:-UTC}
+      - CORS_ORIGIN=${CORS_ORIGIN:-*}
      - LOG_LEVEL=${LOG_LEVEL:-info}
-      # Optional: Override auto-generated secrets
-      # - JWT_SECRET=${JWT_SECRET}
-      # - COOKIE_SECRET=${COOKIE_SECRET}
    volumes:
      - tracearr_postgres:/data/postgres
      - tracearr_redis:/data/redis
--- a/komodo/s3/versitygw/.env.sample
+++ b/komodo/s3/versitygw/.env.sample
@@ -0,0 +1,37 @@
+# Versity S3 Gateway — root credentials for the flat-file IAM backend.
+# https://github.com/versity/versitygw/wiki/Quickstart
+#
+# Local: copy to `.env` next to compose.yaml (or set `run_directory` to this folder
+# in Komodo) so `docker compose` can interpolate `${ROOT_ACCESS_KEY}` etc.
+#
+# Komodo: Stack Environment is written to `<run_directory>/.env` and passed as
+# `--env-file` — that drives `${VAR}` in compose.yaml. Set **one** pair using exact
+# names (leave the other pair unset / empty):
+#   ROOT_ACCESS_KEY + ROOT_SECRET_KEY
+#   ROOT_ACCESS_KEY_ID + ROOT_SECRET_ACCESS_KEY (Helm-style)
+
+ROOT_ACCESS_KEY=
+ROOT_SECRET_KEY=
+# ROOT_ACCESS_KEY_ID=
+# ROOT_SECRET_ACCESS_KEY=
+
+# Host port mapped to the gateway (container listens on 10000).
+VERSITYGW_PORT=10000
+
+# WebUI (container listens on 8080). In Pangolin, create a *second* HTTP resource for this
+# port — do not point the UI hostname at :10000 (that is S3 API only; `/` is not the SPA).
+VERSITYGW_WEBUI_PORT=8080
+# HTTPS URL of the *S3 API* (Pangolin resource → host :10000). **Not** the WebUI URL.
+# No trailing slash. Wrong value → WebUI calls the wrong host and bucket create can 404.
+# VGW_WEBUI_GATEWAYS=https://s3.example.com
+VGW_WEBUI_GATEWAYS=
+
+# Public origin of the **WebUI** page (Pangolin → :8080), e.g. https://s3-ui.example.com
+# Required when UI and API are on different hosts so the browser can call the API (CORS).
+# VGW_CORS_ALLOW_ORIGIN=https://s3-ui.example.com
+VGW_CORS_ALLOW_ORIGIN=
+
+# NFS: object metadata defaults to xattrs; most NFS mounts need sidecar mode
+# (compose.yaml uses --sidecar /data/sidecar). Create the host path, e.g.
+#   mkdir -p /mnt/nfs/versity/sidecar
+# Or use NFSv4.2 with xattr support and remove --sidecar from compose if you prefer.
--- a/komodo/s3/versitygw/compose.yaml
+++ b/komodo/s3/versitygw/compose.yaml
@@ -0,0 +1,64 @@
+# Versity S3 Gateway — POSIX backend over Docker volumes.
+# https://github.com/versity/versitygw
+#
+# POSIX default metadata uses xattrs; NFS often lacks xattr support unless NFSv4.2
+# + client/server support. `--sidecar` stores metadata in files instead (see
+# `posix` flags / VGW_META_SIDECAR in cmd/versitygw/posix.go).
+services:
+  versitygw:
+    image: versity/versitygw:v1.3.1
+    container_name: versitygw
+    restart: unless-stopped
+    # Credentials: use `${VAR}` so values come from the same env Komodo passes with
+    # `docker compose --env-file <run_directory>/.env` (see Komodo Stack docs).
+    # Do NOT use `env_file: .env` here: that path is resolved next to *this* compose
+    # file, while Komodo writes `.env` under `run_directory` — they often differ
+    # (e.g. run_directory = repo root, compose in komodo/s3/versitygw/).
+    environment:
+      ROOT_ACCESS_KEY: ${ROOT_ACCESS_KEY}
+      ROOT_SECRET_KEY: ${ROOT_SECRET_KEY}
+      ROOT_ACCESS_KEY_ID: ${ROOT_ACCESS_KEY_ID}
+      ROOT_SECRET_ACCESS_KEY: ${ROOT_SECRET_ACCESS_KEY}
+      # Matches Helm chart default; enables `/_/health` for probes.
+      VGW_HEALTH: /_/health
+      # WebUI (browser): separate listener; TLS terminates at Pangolin — serve HTTP in-container.
+      VGW_WEBUI_NO_TLS: "true"
+      # Public base URL of the *S3 API* only (Pangolin → :10000). Not the WebUI hostname.
+      # No trailing slash. If this points at the UI URL, bucket ops return 404/wrong host.
+      VGW_WEBUI_GATEWAYS: ${VGW_WEBUI_GATEWAYS}
+      # Browser Origin when WebUI and API use different HTTPS hostnames (see wiki / WebGUI CORS).
+      VGW_CORS_ALLOW_ORIGIN: ${VGW_CORS_ALLOW_ORIGIN}
+    ports:
+      - "${VERSITYGW_PORT:-10000}:10000"
+      - "${VERSITYGW_WEBUI_PORT:-8080}:8080"
+    volumes:
+      - /mnt/nfs/versity/s3:/data/s3
+      - /mnt/nfs/versity/iam:/data/iam
+      - /mnt/nfs/versity/versions:/data/versions
+      - /mnt/nfs/versity/sidecar:/data/sidecar
+    command:
+      - "--port"
+      - ":10000"
+      # Optional WebUI — without this, only the S3 API is served (browsers often see 404 on `/`).
+      - "--webui"
+      - ":8080"
+      - "--iam-dir"
+      - "/data/iam"
+      - "posix"
+      - "--sidecar"
+      - "/data/sidecar"
+      - "--versioning-dir"
+      - "/data/versions"
+      - "/data/s3"
+    healthcheck:
+      test:
+        [
+          "CMD",
+          "wget",
+          "-qO-",
+          "http://127.0.0.1:10000/_/health",
+        ]
+      interval: 30s
+      timeout: 5s
+      retries: 3
+      start_period: 10s
--- a/talos/CLUSTER-BUILD.md
+++ b/talos/CLUSTER-BUILD.md
@@ -4,24 +4,24 @@ This document is the **exported TODO** for the **noble** Talos cluster (4 nodes)

 ## Current state (2026-03-28)

-Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vault **CiliumNetworkPolicy**, **`talos/runbooks/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** when S3 exists.
+Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (**`talos/runbooks/`**, **SOPS**-encrypted secrets in **`clusters/noble/secrets/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** backup/restore drill after S3 credentials are set (**`noble_velero_install`**).

 - **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
 - **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/bootstrap/cilium/`, phase 1 values).
+- **CSI Volume Snapshot** — **external-snapshotter** **v8.5.0** CRDs + **`registry.k8s.io/sig-storage/snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**).
 - **MetalLB** Helm **0.15.3** / app **v0.15.3**; **IPAddressPool** `noble-l2` + **L2Advertisement** — pool **`192.168.50.210`–`192.168.50.229`**.
 - **kube-vip** DaemonSet **3/3** on control planes; VIP **`192.168.50.230`** on **`ens18`** (`vip_subnet` **`/32`** required — bare **`32`** breaks parsing). **Verified from workstation:** `kubectl config set-cluster noble --server=https://192.168.50.230:6443` then **`kubectl get --raw /healthz`** → **`ok`** (`talos/kubeconfig`; see `talos/README.md`).
 - **metrics-server** Helm **3.13.0** / app **v0.8.0** — `clusters/noble/bootstrap/metrics-server/values.yaml` (`--kubelet-insecure-tls` for Talos); **`kubectl top nodes`** works.
 - **Longhorn** Helm **1.11.1** / app **v1.11.1** — `clusters/noble/bootstrap/longhorn/` (PSA **privileged** namespace, `defaultDataPath` `/var/mnt/longhorn`, `preUpgradeChecker` enabled); **StorageClass** `longhorn` (default); **`nodes.longhorn.io`** all **Ready**; test **PVC** `Bound` on `longhorn`.
 - **Traefik** Helm **39.0.6** / app **v3.6.11** — `clusters/noble/bootstrap/traefik/`; **`Service`** **`LoadBalancer`** **`EXTERNAL-IP` `192.168.50.211`**; **`IngressClass`** **`traefik`** (default). Point **`*.apps.noble.lab.pcenicni.dev`** at **`192.168.50.211`**. MetalLB pool verification was done before replacing the temporary nginx test with Traefik.
 - **cert-manager** Helm **v1.20.0** / app **v1.20.0** — `clusters/noble/bootstrap/cert-manager/`; **`ClusterIssuer`** **`letsencrypt-staging`** and **`letsencrypt-prod`** (**DNS-01** via **Cloudflare** for **`pcenicni.dev`**, Secret **`cloudflare-dns-api-token`** in **`cert-manager`**); ACME email **`certificates@noble.lab.pcenicni.dev`** (edit in manifests if you want a different mailbox).
- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Prefer a **SealedSecret** in git (`kubeseal` — see `clusters/noble/bootstrap/sealed-secrets/examples/`) after rotating credentials if they were exposed. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
- **Argo CD** Helm **9.4.17** / app **v3.3.6** — `clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; app-of-apps root syncs **`clusters/noble/apps/`** (edit **`root-application.yaml`** `repoURL` before applying).
+- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Store credentials in git with **SOPS** (`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`, **`age-key.txt`**, **`.sops.yaml`**) — see **`clusters/noble/secrets/README.md`**. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
+- **Argo CD** Helm **9.4.17** / app **v3.3.6** — `clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; **`noble-root`** → **`clusters/noble/apps/`**; **`noble-bootstrap-root`** → **`clusters/noble/bootstrap`** (manual sync until **`argocd/README.md`** §5 after **`noble.yml`**). Edit **`repoURL`** in both root **`Application`** files before applying.
 - **kube-prometheus-stack** — Helm chart **82.15.1** — `clusters/noble/bootstrap/kube-prometheus-stack/` (**namespace** `monitoring`, PSA **privileged** — **node-exporter** needs host mounts); **Longhorn** PVCs for Prometheus, Grafana, Alertmanager; **node-exporter** DaemonSet **4/4**. **Grafana Ingress:** **`https://grafana.apps.noble.lab.pcenicni.dev`** (Traefik **`ingressClassName: traefik`**, **`cert-manager.io/cluster-issuer: letsencrypt-prod`**). **Loki** datasource in Grafana: ConfigMap **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** (sidecar label **`grafana_datasource: "1"`**) — not via **`grafana.additionalDataSources`** in the chart. **`helm upgrade --install` with `--wait` is silent until done** — use **`--timeout 30m`**; Grafana admin: Secret **`kube-prometheus-grafana`**, keys **`admin-user`** / **`admin-password`**.
 - **Loki** + **Fluent Bit** — **`grafana/loki` 6.55.0** SingleBinary + **filesystem** on **Longhorn** (`clusters/noble/bootstrap/loki/`); **`loki.auth_enabled: false`**; **`chunksCache.enabled: false`** (no memcached chunk cache). **`fluent/fluent-bit` 0.56.0** → **`loki-gateway.loki.svc:80`** (`clusters/noble/bootstrap/fluent-bit/`); **`logging`** PSA **privileged**. **Grafana Explore:** **`kubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** then **Explore → Loki** (e.g. `{job="fluent-bit"}`).
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1** — `clusters/noble/bootstrap/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0** — `clusters/noble/bootstrap/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
- **Vault** Helm **0.32.0** / app **1.21.2** — `clusters/noble/bootstrap/vault/` — standalone **file** storage, **Longhorn** PVC; **HTTP** listener (`global.tlsDisable`); optional **CronJob** lab unseal **`unseal-cronjob.yaml`**; **not** initialized in git — run **`vault operator init`** per **`README.md`**.
- **Still open:** **Renovate** — install **[Mend Renovate](https://github.com/apps/renovate)** (or self-host) so PRs run; optional **Alertmanager** notification channels; optional **sample Ingress + cert + Pangolin** end-to-end; **Velero** when S3 is ready; **Argo CD SSO**.
+- **SOPS** — cluster **`Secret`** manifests under **`clusters/noble/secrets/`** encrypted with **age** (see **`.sops.yaml`**, **`age-key.txt`** gitignored); **`noble.yml`** decrypt-applies when the private key is present.
+- **Velero** Helm **12.0.0** / app **v1.18.0** — `clusters/noble/bootstrap/velero/` (**Ansible** **`noble_velero`**, not Argo); **S3-compatible** backup location + **CSI** snapshots (**`EnableCSI`**); enable with **`noble_velero_install`** per **`velero/README.md`**.
+- **Still open:** **Renovate** — install **[Mend Renovate](https://github.com/apps/renovate)** (or self-host) so PRs run; optional **Alertmanager** notification channels; optional **sample Ingress + cert + Pangolin** end-to-end; **Argo CD SSO**.

 ## Inventory

@@ -44,7 +44,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 | Grafana (Ingress + TLS) | **`grafana.apps.noble.lab.pcenicni.dev`** — `grafana.ingress` in `clusters/noble/bootstrap/kube-prometheus-stack/values.yaml` (**`letsencrypt-prod`**) |
 | Headlamp (Ingress + TLS) | **`headlamp.apps.noble.lab.pcenicni.dev`** — chart `ingress` in `clusters/noble/bootstrap/headlamp/` (**`letsencrypt-prod`**, **`ingressClassName: traefik`**) |
 | Public DNS (Pangolin) | **Newt** tunnel + **CNAME** at registrar + **Integration API** — `clusters/noble/bootstrap/newt/` |
-| Velero | S3-compatible URL — configure later |
+| Velero | S3-compatible endpoint + bucket — **`clusters/noble/bootstrap/velero/`**, **`ansible/playbooks/noble.yml`** (**`noble_velero_install`**) |

 ## Versions

@@ -62,11 +62,9 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 - kube-prometheus-stack: **82.15.1** (Helm chart `prometheus-community/kube-prometheus-stack`; app **v0.89.x** bundle)
 - Loki: **6.55.0** (Helm chart `grafana/loki`; app **3.6.7**)
 - Fluent Bit: **0.56.0** (Helm chart `fluent/fluent-bit`; app **4.2.3**)
- Sealed Secrets: **2.18.4** (Helm chart `sealed-secrets/sealed-secrets`; app **0.36.1**)
- External Secrets Operator: **2.2.0** (Helm chart `external-secrets/external-secrets`; app **v2.2.0**)
- Vault: **0.32.0** (Helm chart `hashicorp/vault`; app **1.21.2**)
 - Kyverno: **3.7.1** (Helm chart `kyverno/kyverno`; app **v1.17.1**); **kyverno-policies** **3.7.1** — **baseline** PSS, **Audit** (`clusters/noble/bootstrap/kyverno/`)
 - Headlamp: **0.40.1** (Helm chart `headlamp/headlamp`; app matches chart — see [Artifact Hub](https://artifacthub.io/packages/helm/headlamp/headlamp))
+- Velero: **12.0.0** (Helm chart `vmware-tanzu/velero`; app **v1.18.0**) — **`clusters/noble/bootstrap/velero/`**; AWS plugin **v1.14.0**; Ansible **`noble_velero`**
 - Renovate: **hosted** (Mend **Renovate** GitHub/GitLab app — no cluster chart) **or** **self-hosted** — pin chart when added ([Helm charts](https://docs.renovatebot.com/helm-charts/), OCI `ghcr.io/renovatebot/charts/renovate`); pair **`renovate.json`** with this repo’s Helm paths under **`clusters/noble/`**

 ## Repo paths (this workspace)
@@ -74,30 +72,30 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 | Artifact | Path |
 |----------|------|
 | This checklist | `talos/CLUSTER-BUILD.md` |
-| Operational runbooks (API VIP, etcd, Longhorn, Vault) | `talos/runbooks/` |
+| Operational runbooks (API VIP, etcd, Longhorn, SOPS) | `talos/runbooks/` |
 | Talos quick start + networking + kubeconfig | `talos/README.md` |
 | talhelper source (active) | `talos/talconfig.yaml` — may be **wipe-phase** (no Longhorn volume) during disk recovery |
 | Longhorn volume restore | `talos/talconfig.with-longhorn.yaml` — copy to `talconfig.yaml` after GPT wipe (see `talos/README.md` §5) |
 | Longhorn GPT wipe automation | `talos/scripts/longhorn-gpt-recovery.sh` |
 | kube-vip (kustomize) | `clusters/noble/bootstrap/kube-vip/` (`vip_interface` e.g. `ens18`) |
 | Cilium (Helm values) | `clusters/noble/bootstrap/cilium/` — `values.yaml` (phase 1), optional `values-kpr.yaml`, `README.md` |
+| CSI Volume Snapshot (CRDs + controller) | `clusters/noble/bootstrap/csi-snapshot-controller/` — `crd/`, `controller/` kustomize; **`ansible/roles/noble_csi_snapshot_controller`** |
 | MetalLB | `clusters/noble/bootstrap/metallb/` — `namespace.yaml` (PSA **privileged**), `ip-address-pool.yaml`, `kustomization.yaml`, `README.md` |
 | Longhorn | `clusters/noble/bootstrap/longhorn/` — `values.yaml`, `namespace.yaml` (PSA **privileged**), `kustomization.yaml` |
 | metrics-server (Helm values) | `clusters/noble/bootstrap/metrics-server/values.yaml` |
 | Traefik (Helm values) | `clusters/noble/bootstrap/traefik/` — `values.yaml`, `namespace.yaml`, `README.md` |
 | cert-manager (Helm + ClusterIssuers) | `clusters/noble/bootstrap/cert-manager/` — `values.yaml`, `namespace.yaml`, `kustomization.yaml`, `README.md` |
 | Newt / Pangolin tunnel (Helm) | `clusters/noble/bootstrap/newt/` — `values.yaml`, `namespace.yaml`, `README.md` |
-| Argo CD (Helm) + optional app-of-apps | `clusters/noble/bootstrap/argocd/` — `values.yaml`, `root-application.yaml`, `README.md`; optional **`Application`** tree in **`clusters/noble/apps/`** |
+| Argo CD (Helm) + app-of-apps | `clusters/noble/bootstrap/argocd/` — `values.yaml`, `root-application.yaml`, `bootstrap-root-application.yaml`, `app-of-apps/`, `README.md`; **`noble-root`** syncs **`clusters/noble/apps/`**; **`noble-bootstrap-root`** syncs **`clusters/noble/bootstrap`** (enable automation after **`noble.yml`**) |
 | kube-prometheus-stack (Helm values) | `clusters/noble/bootstrap/kube-prometheus-stack/` — `values.yaml`, `namespace.yaml` |
 | Grafana Loki datasource (ConfigMap; no chart change) | `clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml` |
 | Loki (Helm values) | `clusters/noble/bootstrap/loki/` — `values.yaml`, `namespace.yaml` |
 | Fluent Bit → Loki (Helm values) | `clusters/noble/bootstrap/fluent-bit/` — `values.yaml`, `namespace.yaml` |
-| Sealed Secrets (Helm) | `clusters/noble/bootstrap/sealed-secrets/` — `values.yaml`, `namespace.yaml`, `README.md` |
-| External Secrets Operator (Helm + Vault store example) | `clusters/noble/bootstrap/external-secrets/` — `values.yaml`, `namespace.yaml`, `README.md`, `examples/vault-cluster-secret-store.yaml` |
-| Vault (Helm + optional unseal CronJob) | `clusters/noble/bootstrap/vault/` — `values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `cilium-network-policy.yaml`, `configure-kubernetes-auth.sh`, `README.md` |
+| SOPS-encrypted cluster Secrets | `clusters/noble/secrets/` — `README.md`, `*.secret.yaml`; **`.sops.yaml`**, **`age-key.txt`** (gitignored) at repo root |
 | Kyverno + PSS baseline policies | `clusters/noble/bootstrap/kyverno/` — `values.yaml`, `policies-values.yaml`, `namespace.yaml`, `README.md` |
 | Headlamp (Helm + Ingress) | `clusters/noble/bootstrap/headlamp/` — `values.yaml`, `namespace.yaml`, `README.md` |
-| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (**Sealed Secrets** / **ESO** after **Phase E**) |
+| Velero (Helm + S3 BSL; CSI snapshots) | `clusters/noble/bootstrap/velero/` — `values.yaml`, `namespace.yaml`, `README.md`; **`ansible/roles/noble_velero`** |
+| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (SOPS under **`clusters/noble/secrets/`** or imperative **`kubectl create secret`**) |

 **Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.

@@ -106,11 +104,12 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 1. **Talos** installed; **Cilium** (or chosen CNI) **before** most workloads — with `cni: none`, nodes stay **NotReady** / **network-unavailable** taint until CNI is up.
 2. **MetalLB Helm chart** (CRDs + controller) **before** `kubectl apply -k` on the pool manifests.
 3. **`clusters/noble/bootstrap/metallb/namespace.yaml`** before or merged onto `metallb-system` so Pod Security does not block speaker (see `bootstrap/metallb/README.md`).
-4. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
-5. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
-6. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
+4. **CSI Volume snapshots:** **`kubernetes-csi/external-snapshotter`** CRDs + **`snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`) before relying on **Longhorn** / **Velero** volume snapshots.
+5. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
+6. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
 7. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
-8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
+8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, store the token with **SOPS** or an imperative Secret — no ingress required for the bot itself.
+9. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).

 ## Prerequisites (before phases)

@@ -136,9 +135,10 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul

 ## Phase B — Core platform

-**Install order:** **Cilium** → **metrics-server** → **Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned.
+**Install order:** **Cilium** → **Volume Snapshot CRDs + snapshot-controller** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**) → **metrics-server** → **Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned.

 - [x] **Cilium** (Helm **1.16.6**) — **required** before MetalLB if `cni: none` (`clusters/noble/bootstrap/cilium/`)
+- [x] **CSI Volume Snapshot** — CRDs + **`snapshot-controller`** in **`kube-system`** (`clusters/noble/bootstrap/csi-snapshot-controller/`); Ansible **`noble_csi_snapshot_controller`**; verify `kubectl api-resources | grep VolumeSnapshot`
 - [x] **metrics-server** — Helm **3.13.0**; values in `clusters/noble/bootstrap/metrics-server/values.yaml`; verify `kubectl top nodes`
 - [x] **Longhorn** — Talos: user volume + kubelet mounts + extensions (`talos/README.md` §5); Helm **1.11.1**; `kubectl apply -k clusters/noble/bootstrap/longhorn`; verify **`nodes.longhorn.io`** and test PVC **`Bound`**
 - [x] **MetalLB** — chart installed; **pool + L2** from `clusters/noble/bootstrap/metallb/` applied (`192.168.50.210`–`229`)
@@ -152,7 +152,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 - [x] **Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`) — also covered by **`ansible/playbooks/noble.yml`** (role **`noble_argocd`**)
 - [x] Argo CD server **LoadBalancer** — **`192.168.50.210`** (see `values.yaml`)
 - [x] **App-of-apps** — optional; **`clusters/noble/apps/kustomization.yaml`** is **empty** (core stack is **Ansible**-managed from **`clusters/noble/bootstrap/`**, not Argo). Set **`repoURL`** in **`root-application.yaml`** and add **`Application`** manifests only for optional GitOps workloads — see **`clusters/noble/apps/README.md`**
- [x] **Renovate** — **`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **Sealed Secrets** / **ESO**. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed.
+- [x] **Renovate** — **`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **SOPS** or a one-off Secret. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed.
 - [ ] SSO — later

 ## Phase D — Observability
@@ -163,19 +163,16 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul

 ## Phase E — Secrets

- [x] **Sealed Secrets** (optional Git workflow) — `clusters/noble/bootstrap/sealed-secrets/` (Helm **2.18.4**); **`kubeseal`** + key backup per **`README.md`**
- [x] **Vault** in-cluster on Longhorn + **auto-unseal** — `clusters/noble/bootstrap/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); **`configure-kubernetes-auth.sh`** for ESO (**Kubernetes auth** + KV + role)
- [x] **External Secrets Operator** + Vault `ClusterSecretStore` — operator **`clusters/noble/bootstrap/external-secrets/`** (Helm **2.2.0**); apply **`examples/vault-cluster-secret-store.yaml`** after Vault (**`README.md`**)
+- [x] **SOPS** — encrypt **`Secret`** YAML under **`clusters/noble/secrets/`** with **age** (see **`.sops.yaml`**, **`clusters/noble/secrets/README.md`**); keep **`age-key.txt`** private (gitignored). **`ansible/playbooks/noble.yml`** decrypt-applies **`*.yaml`** when **`age-key.txt`** exists.

 ## Phase F — Policy + backups

 - [x] **Kyverno** baseline policies — `clusters/noble/bootstrap/kyverno/` (Helm **kyverno** **3.7.1** + **kyverno-policies** **3.7.1**, **baseline** / **Audit** — see **`README.md`**)
- [ ] **Velero** when S3 is ready; backup/restore drill
+- [ ] **Velero** — manifests + Ansible **`noble_velero`** (`clusters/noble/bootstrap/velero/`); enable with **`noble_velero_install: true`** + S3 bucket/URL + **`velero/velero-cloud-credentials`** (see **`velero/README.md`**); optional backup/restore drill

 ## Phase G — Hardening

- [x] **Cilium** — Vault **`CiliumNetworkPolicy`** (`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`) — HTTP **8200** from **`external-secrets`** + **`vault`**; extend for other clients as needed
- [x] **Runbooks** — **`talos/runbooks/`** (API VIP / kube-vip, etcd–Talos, Longhorn, Vault)
+- [x] **Runbooks** — **`talos/runbooks/`** (API VIP / kube-vip, etcd–Talos, Longhorn, SOPS)
 - [x] **RBAC** — **Headlamp** **`ClusterRoleBinding`** uses built-in **`edit`** (not **`cluster-admin`**); **Argo CD** **`policy.default: role:readonly`** with **`g, admin, role:admin`** — see **`clusters/noble/bootstrap/headlamp/values.yaml`**, **`clusters/noble/bootstrap/argocd/values.yaml`**, **`talos/runbooks/rbac.md`**
 - [ ] **Alertmanager** — add **`slack_configs`**, **`pagerduty_configs`**, or other receivers under **`kube-prometheus-stack`** `alertmanager.config` (chart defaults use **`null`** receiver)

@@ -193,11 +190,10 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 - [x] **`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**)
 - [x] **Grafana** — **Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync)
 - [x] **Headlamp** — Deployment **Running** in **`headlamp`**; UI at **`https://headlamp.apps.noble.lab.pcenicni.dev`** (TLS via **`letsencrypt-prod`**)
- [x] **`sealed-secrets`** — controller **Deployment** **Running** in **`sealed-secrets`** (install + **`kubeseal`** per **`apps/sealed-secrets/README.md`**)
- [x] **`external-secrets`** — controller + webhook + cert-controller **Running** in **`external-secrets`**; apply **`ClusterSecretStore`** after Vault **Kubernetes auth**
- [x] **`vault`** — **StatefulSet** **Running**, **`data-vault-0`** PVC **Bound** on **longhorn**; **`vault operator init`** + unseal per **`apps/vault/README.md`**
+- [x] **SOPS secrets** — **`clusters/noble/secrets/*.yaml`** encrypted in git; **`noble.yml`** applies decrypted manifests when **`age-key.txt`** is present
 - [x] **`kyverno`** — admission / background / cleanup / reports controllers **Running** in **`kyverno`**; **ClusterPolicies** for **PSS baseline** **Ready** (**Audit**)
- [x] **Phase G (partial)** — Vault **`CiliumNetworkPolicy`**; **`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
+- [ ] **`velero`** — when enabled: Deployment **Running** in **`velero`**; **`BackupStorageLocation`** / **`VolumeSnapshotLocation`** **Available**; test backup per **`velero/README.md`**
+- [x] **Phase G (partial)** — **`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional

 ---

--- a/talos/README.md
+++ b/talos/README.md
@@ -1,7 +1,7 @@
 # Talos — noble lab

 - **Cluster build checklist (exported TODO):** [CLUSTER-BUILD.md](./CLUSTER-BUILD.md)
- **Operational runbooks (API VIP, etcd, Longhorn, Vault):** [runbooks/README.md](./runbooks/README.md)
+- **Operational runbooks (API VIP, etcd, Longhorn, SOPS):** [runbooks/README.md](./runbooks/README.md)

 ## Versions

--- a/talos/runbooks/README.md
+++ b/talos/runbooks/README.md
@@ -7,5 +7,5 @@ Short recovery / triage notes for the **noble** Talos cluster. Deep procedures l
 | Kubernetes API VIP (kube-vip) | [`api-vip-kube-vip.md`](./api-vip-kube-vip.md) |
 | etcd / Talos control plane | [`etcd-talos.md`](./etcd-talos.md) |
 | Longhorn storage | [`longhorn.md`](./longhorn.md) |
-| Vault (unseal, auth, ESO) | [`vault.md`](./vault.md) |
+| SOPS (secrets in git) | [`sops.md`](./sops.md) |
 | RBAC (Headlamp, Argo CD) | [`rbac.md`](./rbac.md) |
--- a/talos/runbooks/sops.md
+++ b/talos/runbooks/sops.md
@@ -0,0 +1,13 @@
+# Runbook: SOPS secrets (git-encrypted)
+
+**Symptoms:** `sops -d` fails; `kubectl apply` after Ansible shows no secret; `noble.yml` skips apply.
+
+**Checklist**
+
+1. **Private key:** `age-key.txt` at the repository root (gitignored). Create with `age-keygen -o age-key.txt` and add the **public** key to `.sops.yaml` (see `clusters/noble/secrets/README.md`).
+2. **Environment:** `export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt` when editing or applying by hand.
+3. **Edit encrypted file:** `sops clusters/noble/secrets/<name>.secret.yaml`
+4. **Apply one file:** `sops -d clusters/noble/secrets/<name>.secret.yaml | kubectl apply -f -`
+5. **Ansible:** `noble_apply_sops_secrets` is true by default; the platform role applies all `*.yaml` when `age-key.txt` exists.
+
+**References:** [`clusters/noble/secrets/README.md`](../../clusters/noble/secrets/README.md), [Mozilla SOPS](https://github.com/getsops/sops).
--- a/talos/runbooks/vault.md
+++ b/talos/runbooks/vault.md
@@ -1,15 +0,0 @@
-# Runbook: Vault (in-cluster)
-
-**Symptoms:** External Secrets **not syncing**, `ClusterSecretStore` **InvalidProviderConfig**, Vault UI/API **503 sealed**, pods **CrashLoop** on auth.
-
-**Checks**
-
-1. `kubectl -n vault exec -i sts/vault -- vault status` — **Sealed** / **Initialized**.
-2. Unseal key Secret + optional CronJob: [`clusters/noble/bootstrap/vault/README.md`](../../clusters/noble/bootstrap/vault/README.md), `unseal-cronjob.yaml`.
-3. Kubernetes auth for ESO: [`clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`](../../clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh) and `kubectl describe clustersecretstore vault`.
-4. **Cilium** policy: if Vault is unreachable from `external-secrets`, check [`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`](../../clusters/noble/bootstrap/vault/cilium-network-policy.yaml) and extend `ingress` for new client namespaces.
-
-**Common fixes**
-
- Sealed: `vault operator unseal` or fix auto-unseal CronJob + `vault-unseal-key` Secret.
- **403/invalid role** on ESO: re-run Kubernetes auth setup (issuer/CA/reviewer JWT) per README.
Author	SHA1	Message	Date
Nikholas Pcenicni	aeffc7d6dd	Remove Argo CD application configurations for Fluent Bit, Headlamp, Loki, kube-prometheus, and associated kustomization files from the noble bootstrap directory. This cleanup streamlines the project by eliminating unused resources and simplifies the deployment structure.	2026-04-01 02:14:49 -04:00
Nikholas Pcenicni	0f88a33216	Remove deprecated Argo CD application configurations for various components including cert-manager, Cilium, CSI snapshot controllers, kube-vip, and others. Update README.md to reflect the current state of leaf applications and clarify optional components. Adjust kustomization files to streamline resource management for bootstrap workloads.	2026-04-01 02:13:15 -04:00
Nikholas Pcenicni	bfb72cb519	Update Argo CD documentation and kustomization files to include additional applications and namespace resources. Enhance README.md with current leaf applications and clarify optional components. This improves deployment clarity and organization for bootstrap workloads.	2026-04-01 02:11:19 -04:00
Nikholas Pcenicni	51eb64dd9d	Add applications to Argo CD kustomization.yaml for enhanced deployment	2026-04-01 02:05:10 -04:00
Nikholas Pcenicni	f259285f6e	Enhance Argo CD integration by adding support for a bootstrap root application. Update `group_vars/all.yml` and role defaults to include `noble_argocd_apply_bootstrap_root_application`. Modify tasks to apply the bootstrap application conditionally. Revise documentation to clarify the GitOps workflow and the relationship between the core platform and optional applications. Remove outdated references and streamline the README for better user guidance.	2026-04-01 01:55:41 -04:00
Nikholas Pcenicni	c312ceeb56	Remove Eclipse Che application configurations and related documentation from the repository. This includes the deletion of application-checluster.yaml, application-devworkspace.yaml, application-operator.yaml, checluster.yaml, dwoc.yaml, kustomization.yaml, and README.md, streamlining the project by eliminating outdated resources.	2026-04-01 01:21:32 -04:00
Nikholas Pcenicni	c15bf4d708	Enhance Ansible playbooks and documentation for Debian and Proxmox management. Add new playbooks for Debian hardening, maintenance, SSH key rotation, and Proxmox cluster setup. Update README.md with quick start instructions for Debian and Proxmox operations. Modify group_vars to include Argo CD application settings, improving deployment flexibility and clarity.	2026-04-01 01:19:50 -04:00
Nikholas Pcenicni	89be30884e	Update compose.yaml for Tracearr service to change the image tag from 'latest' to 'supervised' and remove unnecessary environment variables for DATABASE_URL and REDIS_URL. This streamlines the configuration and focuses on essential settings for deployment.	2026-03-30 22:53:47 -04:00
Nikholas Pcenicni	16948c62f9	Update compose.yaml for Tracearr service to include production environment variables and database configurations. This enhances deployment settings by specifying NODE_ENV, PORT, HOST, DATABASE_URL, REDIS_URL, JWT_SECRET, COOKIE_SECRET, and CORS_ORIGIN, improving overall service configuration and security.	2026-03-30 22:49:01 -04:00
Nikholas Pcenicni	3a6e5dff5b	Update Ansible configuration to integrate SOPS for managing secrets. Enhance README.md with SOPS usage instructions and prerequisites. Remove External Secrets Operator references and related configurations from the bootstrap process, streamlining the deployment. Adjust playbooks and roles to apply SOPS-encrypted secrets automatically, improving security and clarity in secret management.	2026-03-30 22:42:52 -04:00
Nikholas Pcenicni	023ebfee5d	Enhance Eclipse Che configuration in checluster.yaml by adding externalTLSConfig for secure workspace subdomains. This change ensures cert-manager can issue TLS certificates, preventing issues with unavailable servers when opening workspaces.	2026-03-29 02:03:57 -04:00
Nikholas Pcenicni	27fb4113eb	Refactor DevWorkspaceOperatorConfig in dwoc.yaml to simplify configuration structure. This change removes the unnecessary spec.config nesting, aligning with the v1alpha1 API requirements and improving clarity for users configuring development workspaces.	2026-03-28 19:58:18 -04:00
Nikholas Pcenicni	4026591f0b	Update README.md with troubleshooting steps for Eclipse Che and enhance kustomization.yaml to include DevWorkspaceOperatorConfig. This improves guidance for users facing deployment issues and ensures proper configuration for development workspace management.	2026-03-28 19:56:07 -04:00
Nikholas Pcenicni	8a740019ad	Add Eclipse Che applications to kustomization.yaml for improved development workspace management. This update includes application-devworkspace, application-operator, and application-checluster resources, enhancing the deployment capabilities for the Noble cluster.	2026-03-28 19:53:01 -04:00
Nikholas Pcenicni	544f75b0ee	Enhance documentation and configuration for Velero integration. Update README.md to clarify Velero's lack of web UI and usage instructions for CLI. Add CSI Volume Snapshot support in playbooks and roles, and include Velero service details in noble_landing_urls. Adjust kustomization.yaml to include VolumeSnapshotClass configuration, ensuring proper setup for backups. Improve overall clarity in related documentation.	2026-03-28 19:34:43 -04:00
Nikholas Pcenicni	33a10dc7e9	Add Velero configuration to .env.sample, README.md, and Ansible playbooks. Update group_vars to include noble_velero_install variable. Enhance documentation for optional Velero installation and S3 integration, improving clarity for backup and restore processes.	2026-03-28 18:39:22 -04:00
Nikholas Pcenicni	a4b9913b7e	Update .env.sample and compose.yaml for Versity S3 Gateway to enhance WebUI and CORS configuration. Add comments clarifying the purpose of VGW_CORS_ALLOW_ORIGIN and correct usage of VGW_WEBUI_GATEWAYS, improving deployment instructions and user understanding.	2026-03-28 18:28:52 -04:00
Nikholas Pcenicni	11c62009a4	Update README.md, .env.sample, and compose.yaml for Versity S3 Gateway to clarify WebUI configuration. Enhance README with details on separate API and WebUI ports, and update .env.sample and compose.yaml to include WebUI settings for improved deployment instructions and usability.	2026-03-28 18:20:55 -04:00
Nikholas Pcenicni	03ed4e70a2	Enhance .env.sample and compose.yaml for Versity S3 Gateway by adding detailed comments on NFS metadata handling and sidecar mode. This improves documentation clarity for users configuring NFS mounts and metadata storage options.	2026-03-28 18:17:54 -04:00
Nikholas Pcenicni	7855b10982	Update compose.yaml to change volume paths for Versity S3 Gateway from named volumes to NFS mounts. This adjustment improves data persistence and accessibility by linking directly to the NFS directory structure.	2026-03-28 18:13:52 -04:00
Nikholas Pcenicni	079c11b20c	Refactor Versity S3 Gateway configuration in README.md, .env.sample, and compose.yaml. Update README to clarify environment variable usage and adjust .env.sample for local setup instructions. Modify compose.yaml to utilize environment variable interpolation, ensuring proper credential handling and enhancing deployment security.	2026-03-28 17:56:24 -04:00
Nikholas Pcenicni	bf108a37e2	Update compose.yaml to include .env file for environment variable injection, enhancing security and usability for the Versity S3 Gateway deployment. This change ensures that necessary environment variables are accessible within the container, improving the overall configuration process.	2026-03-28 17:49:43 -04:00
Nikholas Pcenicni	97b56581ed	Update README.md and .env.sample for Versity S3 Gateway configuration. Change path in README to reflect new directory structure and clarify environment variable usage for credentials. Modify .env.sample to include additional credential options and improve documentation for setting up the environment. Adjust compose.yaml to utilize pass-through environment variables, enhancing security and usability for deployment.	2026-03-28 17:46:08 -04:00
Nikholas Pcenicni	f154658d79	Add Versity S3 Gateway documentation to README.md, detailing configuration requirements and usage for shared object storage. This addition enhances clarity for users integrating S3-compatible APIs with POSIX directories.	2026-03-28 17:25:44 -04:00
Nikholas Pcenicni	90509bacc5	Update homepage values.yaml to replace external siteMonitor URLs with in-cluster service URLs for improved reliability. Enhance comments for clarity on service monitoring and Prometheus widget configurations. Adjust description for better accuracy regarding uptime checks and resource monitoring.	2026-03-28 17:13:57 -04:00
Nikholas Pcenicni	e4741ecd15	Enhance homepage values.yaml by adding support for RBAC, service account creation, and site monitoring for various services. Update widget configurations for Prometheus and introduce new widgets for datetime and Kubernetes resource monitoring. Adjust layout and styling settings for improved UI presentation.	2026-03-28 17:11:01 -04:00
Nikholas Pcenicni	f6647056be	Add homepage entry to noble_landing_urls and update kustomization.yaml to include homepage resource	2026-03-28 17:07:06 -04:00