Update Ansible configuration to integrate SOPS for managing secrets. Enhance README.md with SOPS usage instructions and prerequisites. Remove External Secrets Operator references and related configurations from the bootstrap process, streamlining the deployment. Adjust playbooks and roles to apply SOPS-encrypted secrets automatically, improving security and clarity in secret management.

2026-03-30 22:42:52 -04:00
parent 023ebfee5d
commit 3a6e5dff5b
44 changed files with 644 additions and 809 deletions
--- a/.sops.yaml
+++ b/.sops.yaml
@@ -0,0 +1,7 @@
 # Mozilla SOPS — encrypt/decrypt Kubernetes Secret manifests under clusters/noble/secrets/
 # Generate a key: age-keygen -o age-key.txt  (age-key.txt is gitignored)
 # Add the printed public key below (one recipient per line is supported).
 creation_rules:
  - path_regex: clusters/noble/secrets/.*\.yaml$
    age: >-
      age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn
--- a/ansible/README.md
+++ b/ansible/README.md
@@ -24,6 +24,7 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
 ## Prerequisites
 - `talosctl` (matches node Talos version), `talhelper`, `helm`, `kubectl`.
 - **SOPS secrets:** `sops` and `age` on the control host if you use **`clusters/noble/secrets/`** with **`age-key.txt`** (see **`clusters/noble/secrets/README.md`**).
 - **Phase A:** same LAN/VPN as nodes so **Talos :50000** and **Kubernetes :6443** are reachable (see [`talos/README.md`](../talos/README.md) §3).
 - **noble.yml:** bootstrapped cluster and **`talos/kubeconfig`** (or `KUBECONFIG`).
@@ -34,7 +35,7 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
 | [`playbooks/deploy.yml`](playbooks/deploy.yml) | **Talos Phase A** then **`noble.yml`** (full automation). |
 | [`playbooks/talos_phase_a.yml`](playbooks/talos_phase_a.yml) | `genconfig` → `apply-config` → `bootstrap` → `kubeconfig` only. |
 | [`playbooks/noble.yml`](playbooks/noble.yml) | Helm + `kubectl` platform (after Phase A). |
-| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | Vault / ESO reminders (`noble_apply_vault_cluster_secret_store`). |
+| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | SOPS reminders and optional Argo root Application note. |
 | [`playbooks/talos_bootstrap.yml`](playbooks/talos_bootstrap.yml) | **`talhelper genconfig` only** (legacy shortcut; prefer **`talos_phase_a.yml`**). |
 ```bash
@@ -68,9 +69,10 @@ ansible-playbook playbooks/noble.yml --skip-tags newt
 ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true -e noble_velero_s3_bucket=... -e noble_velero_s3_url=...
 ```
-### Variables — `group_vars/all.yml`
+### Variables — `group_vars/all.yml` and role defaults
- **`noble_newt_install`**, **`noble_velero_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_apply_vault_cluster_secret_store`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**.
+- **`group_vars/all.yml`:** **`noble_newt_install`**, **`noble_velero_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**
 - **`roles/noble_platform/defaults/main.yml`:** **`noble_apply_sops_secrets`**, **`noble_sops_age_key_file`** (SOPS secrets under **`clusters/noble/secrets/`**)
 ## Roles
--- a/ansible/group_vars/all.yml
+++ b/ansible/group_vars/all.yml
@@ -13,14 +13,11 @@ noble_k8s_api_server_fallback: "https://192.168.50.20:6443"
 # Only if you must skip the kubectl /healthz preflight (not recommended).
 noble_skip_k8s_health_check: false
-# Pangolin / Newt — set true only after creating newt-pangolin-auth Secret (see clusters/noble/bootstrap/newt/README.md)
+# Pangolin / Newt — set true only after newt-pangolin-auth Secret exists (SOPS: clusters/noble/secrets/ or imperative — see clusters/noble/bootstrap/newt/README.md)
 noble_newt_install: false
 # cert-manager needs Secret cloudflare-dns-api-token in cert-manager namespace before ClusterIssuers work
 noble_cert_manager_require_cloudflare_secret: true
 # post_deploy.yml — apply Vault ClusterSecretStore only after Vault is initialized and K8s auth is configured
 noble_apply_vault_cluster_secret_store: false
 # Velero — set **noble_velero_install: true** plus S3 bucket/URL (and credentials — see clusters/noble/bootstrap/velero/README.md)
 noble_velero_install: false
--- a/ansible/playbooks/post_deploy.yml
+++ b/ansible/playbooks/post_deploy.yml
@@ -1,12 +1,7 @@
 ---
-# Manual follow-ups after **noble.yml**: Vault init/unseal, Kubernetes auth for Vault, ESO ClusterSecretStore.
+# Manual follow-ups after **noble.yml**: SOPS key backup, optional Argo root Application.
-# Run: ansible-playbook playbooks/post_deploy.yml
+- hosts: localhost
 - name: Noble cluster — post-install reminders
  hosts: localhost
  connection: local
  gather_facts: false
  vars:
    noble_repo_root: "{{ playbook_dir | dirname | dirname }}"
    noble_kubeconfig: "{{ lookup('env', 'KUBECONFIG') | default(noble_repo_root + '/talos/kubeconfig', true) }}"
  roles:
-    - role: noble_post_deploy
+    - noble_post_deploy
--- a/ansible/roles/helm_repos/defaults/main.yml
+++ b/ansible/roles/helm_repos/defaults/main.yml
@@ -8,9 +8,6 @@ noble_helm_repos:
  - { name: fossorial, url: "https://charts.fossorial.io" }
  - { name: argo, url: "https://argoproj.github.io/argo-helm" }
  - { name: metrics-server, url: "https://kubernetes-sigs.github.io/metrics-server/" }
  - { name: sealed-secrets, url: "https://bitnami-labs.github.io/sealed-secrets" }
  - { name: external-secrets, url: "https://charts.external-secrets.io" }
  - { name: hashicorp, url: "https://helm.releases.hashicorp.com" }
  - { name: prometheus-community, url: "https://prometheus-community.github.io/helm-charts" }
  - { name: grafana, url: "https://grafana.github.io/helm-charts" }
  - { name: fluent, url: "https://fluent.github.io/helm-charts" }
--- a/ansible/roles/noble_landing_urls/defaults/main.yml
+++ b/ansible/roles/noble_landing_urls/defaults/main.yml
@@ -39,11 +39,6 @@ noble_lab_ui_entries:
    namespace: longhorn-system
    service: longhorn-frontend
    url: https://longhorn.apps.noble.lab.pcenicni.dev
  - name: Vault
    description: Secrets engine UI (after init/unseal)
    namespace: vault
    service: vault
    url: https://vault.apps.noble.lab.pcenicni.dev
  - name: Velero
    description: Cluster backups — no web UI (velero CLI / kubectl CRDs)
    namespace: velero
--- a/ansible/roles/noble_landing_urls/templates/noble-lab-ui-urls.md.j2
+++ b/ansible/roles/noble_landing_urls/templates/noble-lab-ui-urls.md.j2
@@ -24,7 +24,6 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t
 | **Prometheus** | — | No auth in default install (lab). |
 | **Alertmanager** | — | No auth in default install (lab). |
 | **Longhorn** | — | No default login unless you enable access control in the UI settings. |
 | **Vault** | Token | Root token is only from **`vault operator init`** (not stored in git). See `clusters/noble/bootstrap/vault/README.md`. |
 ### Commands to retrieve passwords (if not filled above)
@@ -46,7 +45,7 @@ To generate this file **without** calling kubectl, run Ansible with **`-e noble_
 - **Argo CD** `argocd-initial-admin-secret` disappears after you change the admin password.
 - **Grafana** password is random unless you set `grafana.adminPassword` in chart values.
 - **Vault** UI needs **unsealed** Vault; tokens come from your chosen auth method.
 - **Prometheus / Alertmanager** UIs are unauthenticated by default — restrict when hardening (`talos/CLUSTER-BUILD.md` Phase G).
 - **SOPS:** cluster secrets in git under **`clusters/noble/secrets/`** are encrypted; decrypt with **`age-key.txt`** (not in git). See **`clusters/noble/secrets/README.md`**.
 - **Headlamp** token above expires after the configured duration; re-run Ansible or `kubectl create token` to refresh.
 - **Velero** has **no web UI** — use **`velero`** CLI or **`kubectl -n velero get backup,schedule,backupstoragelocation`**. Metrics: **`velero`** Service in **`velero`** (Prometheus scrape). See `clusters/noble/bootstrap/velero/README.md`.
--- a/ansible/roles/noble_platform/defaults/main.yml
+++ b/ansible/roles/noble_platform/defaults/main.yml
@@ -4,5 +4,6 @@ noble_platform_kubectl_request_timeout: 120s
 noble_platform_kustomize_retries: 5
 noble_platform_kustomize_delay: 20
-# Vault: injector (vault-k8s) owns MutatingWebhookConfiguration.caBundle; Helm upgrade can SSA-conflict. Delete webhook so Helm can recreate it.
+# Decrypt **clusters/noble/secrets/*.yaml** with SOPS and kubectl apply (requires **sops**, **age**, and **age-key.txt**).
-noble_vault_delete_injector_webhook_before_helm: true
+noble_apply_sops_secrets: true
 noble_sops_age_key_file: "{{ noble_repo_root }}/age-key.txt"
--- a/ansible/roles/noble_platform/tasks/main.yml
+++ b/ansible/roles/noble_platform/tasks/main.yml
@@ -1,6 +1,6 @@
 ---
 # Mirrors former **noble-platform** Argo Application: Helm releases + plain manifests under clusters/noble/bootstrap.
- name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource, Vault extras)
+- name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource)
  ansible.builtin.command:
    argv:
      - kubectl
@@ -16,77 +16,26 @@
  until: noble_platform_kustomize.rc == 0
  changed_when: true
- name: Install Sealed Secrets
+- name: Stat SOPS age private key (age-key.txt)
-  ansible.builtin.command:
+  ansible.builtin.stat:
-    argv:
+    path: "{{ noble_sops_age_key_file }}"
-      - helm
+  register: noble_sops_age_key_stat
      - upgrade
      - --install
      - sealed-secrets
      - sealed-secrets/sealed-secrets
      - --namespace
      - sealed-secrets
      - --version
      - "2.18.4"
      - -f
      - "{{ noble_repo_root }}/clusters/noble/bootstrap/sealed-secrets/values.yaml"
      - --wait
  environment:
    KUBECONFIG: "{{ noble_kubeconfig }}"
  changed_when: true
- name: Install External Secrets Operator
+- name: Apply SOPS-encrypted cluster secrets (clusters/noble/secrets/*.yaml)
-  ansible.builtin.command:
+  ansible.builtin.shell: |
-    argv:
+    set -euo pipefail
-      - helm
+    shopt -s nullglob
-      - upgrade
+    for f in "{{ noble_repo_root }}/clusters/noble/secrets"/*.yaml; do
-      - --install
+      sops -d "$f" | kubectl apply -f -
-      - external-secrets
+    done
-      - external-secrets/external-secrets
+  args:
-      - --namespace
+    executable: /bin/bash
      - external-secrets
      - --version
      - "2.2.0"
      - -f
      - "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/values.yaml"
      - --wait
  environment:
    KUBECONFIG: "{{ noble_kubeconfig }}"
-  changed_when: true
+    SOPS_AGE_KEY_FILE: "{{ noble_sops_age_key_file }}"
-
+  when:
-# vault-k8s patches webhook CA after install; Helm 3/4 SSA then conflicts on upgrade. Removing the MWC lets Helm re-apply cleanly; injector repopulates caBundle.
+    - noble_apply_sops_secrets | default(true) | bool
- name: Delete Vault agent injector MutatingWebhookConfiguration before Helm (avoids caBundle field conflict)
+    - noble_sops_age_key_stat.stat.exists
  ansible.builtin.command:
    argv:
      - kubectl
      - delete
      - mutatingwebhookconfiguration
      - vault-agent-injector-cfg
      - --ignore-not-found
  environment:
    KUBECONFIG: "{{ noble_kubeconfig }}"
  register: noble_vault_mwc_delete
  when: noble_vault_delete_injector_webhook_before_helm | default(true) | bool
  changed_when: "'deleted' in (noble_vault_mwc_delete.stdout | default(''))"
 - name: Install Vault
  ansible.builtin.command:
    argv:
      - helm
      - upgrade
      - --install
      - vault
      - hashicorp/vault
      - --namespace
      - vault
      - --version
      - "0.32.0"
      - -f
      - "{{ noble_repo_root }}/clusters/noble/bootstrap/vault/values.yaml"
      - --wait
  environment:
    KUBECONFIG: "{{ noble_kubeconfig }}"
    HELM_SERVER_SIDE_APPLY: "false"
  changed_when: true
 - name: Install kube-prometheus-stack
--- a/ansible/roles/noble_post_deploy/tasks/main.yml
+++ b/ansible/roles/noble_post_deploy/tasks/main.yml
@@ -1,24 +1,10 @@
 ---
- name: Vault — manual steps (not automated)
+- name: SOPS secrets (workstation)
  ansible.builtin.debug:
    msg: |
-      1. kubectl -n vault get pods  (wait for Running)
+      Encrypted Kubernetes Secrets live under clusters/noble/secrets/ (Mozilla SOPS + age).
-      2. kubectl -n vault exec -it vault-0 -- vault operator init  (once; save keys)
+      Private key: age-key.txt at repo root (gitignored). See clusters/noble/secrets/README.md
-      3. Unseal per clusters/noble/bootstrap/vault/README.md
+      and .sops.yaml. noble.yml decrypt-applies these when age-key.txt exists.
      4. ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
      5. kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
 - name: Optional — apply Vault ClusterSecretStore for External Secrets
  ansible.builtin.command:
    argv:
      - kubectl
      - apply
      - -f
      - "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
  environment:
    KUBECONFIG: "{{ noble_kubeconfig }}"
  when: noble_apply_vault_cluster_secret_store | default(false) | bool
  changed_when: true
 - name: Argo CD optional root Application (empty app-of-apps)
  ansible.builtin.debug:
--- a/branding/nikflix/logo.png
+++ b/branding/nikflix/logo.png
--- a/clusters/noble/apps/README.md
+++ b/clusters/noble/apps/README.md
@@ -1,6 +1,6 @@
 # Argo CD — optional applications (non-bootstrap)
-**Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, Vault, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here.
+**Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, SOPS secrets path, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here.
 **`noble-root`** (`clusters/noble/bootstrap/argocd/root-application.yaml`) points at **`clusters/noble/apps`**. Add **`Application`** manifests (and optional **`AppProject`** definitions) under this directory only for workloads that are additive and do not subsume the Ansible-managed platform.
--- a/clusters/noble/apps/homepage/values.yaml
+++ b/clusters/noble/apps/homepage/values.yaml
@@ -79,12 +79,6 @@ config:
            href: https://longhorn.apps.noble.lab.pcenicni.dev
            siteMonitor: http://longhorn-frontend.longhorn-system.svc.cluster.local:80
            description: Storage volumes, nodes, backups
        - Vault:
            icon: si-vault
            href: https://vault.apps.noble.lab.pcenicni.dev
            # Unauthenticated health (HEAD/GET) — not the redirecting UI root
            siteMonitor: http://vault.vault.svc.cluster.local:8200/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204
            description: Secrets engine UI (after init/unseal)
        - Velero:
            icon: mdi-backup-restore
            href: https://velero.io/docs/
--- a/clusters/noble/bootstrap/argocd/README.md
+++ b/clusters/noble/bootstrap/argocd/README.md
@@ -52,7 +52,7 @@ Use **Settings → Repositories** in the UI, or `argocd repo add` / a `Secret` o
 ## 4. App-of-apps (optional GitOps only)
-Bootstrap **platform** workloads (CNI, ingress, cert-manager, Kyverno, observability, Vault, etc.) are installed by
+Bootstrap **platform** workloads (CNI, ingress, cert-manager, Kyverno, observability, etc.) are installed by
 **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not by Argo. **`clusters/noble/apps/kustomization.yaml`** is empty by default.
 1. Edit **`root-application.yaml`**: set **`repoURL`** and **`targetRevision`** to this repository. The **`resources-finalizer.argocd.argoproj.io/background`** finalizer uses Argo’s path-qualified form so **`kubectl apply`** does not warn about finalizer names.
--- a/clusters/noble/bootstrap/external-secrets/README.md
+++ b/clusters/noble/bootstrap/external-secrets/README.md
@@ -1,60 +0,0 @@
 # External Secrets Operator (noble)
 Syncs secrets from external systems into Kubernetes **Secret** objects via **ExternalSecret** / **ClusterExternalSecret** CRDs.
 - **Chart:** `external-secrets/external-secrets` **2.2.0** (app **v2.2.0**)
 - **Namespace:** `external-secrets`
 - **Helm release name:** `external-secrets` (matches the operator **ServiceAccount** name `external-secrets`)
 ## Install
 ```bash
 helm repo add external-secrets https://charts.external-secrets.io
 helm repo update
 kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
 helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
  --version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
 ```
 Verify:
 ```bash
 kubectl -n external-secrets get deploy,pods
 kubectl get crd | grep external-secrets
 ```
 ## Vault `ClusterSecretStore` (after Vault is deployed)
 The checklist expects a **Vault**-backed store. Install Vault first (`talos/CLUSTER-BUILD.md` Phase E — Vault on Longhorn + auto-unseal), then:
 1. Enable **KV v2** secrets engine and **Kubernetes** auth in Vault; create a **role** (e.g. `external-secrets`) that maps the cluster’s **`external-secrets` / `external-secrets`** service account to a policy that can read the paths you need.
 2. Copy **`examples/vault-cluster-secret-store.yaml`**, set **`spec.provider.vault.server`** to your Vault URL. This repo’s Vault Helm values use **HTTP** on port **8200** (`global.tlsDisable: true`): **`http://vault.vault.svc.cluster.local:8200`**. Use **`https://`** if you enable TLS on the Vault listener.
 3. If Vault uses a **private TLS CA**, configure **`caProvider`** or **`caBundle`** on the Vault provider — see [HashiCorp Vault provider](https://external-secrets.io/latest/provider/hashicorp-vault/). Do not commit private CA material to public git unless intended.
 4. Apply: **`kubectl apply -f …/vault-cluster-secret-store.yaml`**
 5. Confirm the store is ready: **`kubectl describe clustersecretstore vault`**
 Example **ExternalSecret** (after the store is healthy):
 ```yaml
 apiVersion: external-secrets.io/v1
 kind: ExternalSecret
 metadata:
  name: demo
  namespace: default
 spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault
    kind: ClusterSecretStore
  target:
    name: demo-synced
  data:
    - secretKey: password
      remoteRef:
        key: secret/data/myapp
        property: password
 ```
 ## Upgrades
 Pin the chart version in `values.yaml` header comments; run the same **`helm upgrade --install`** with the new **`--version`** after reviewing [release notes](https://github.com/external-secrets/external-secrets/releases).
--- a/clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
+++ b/clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
@@ -1,31 +0,0 @@
 # ClusterSecretStore for HashiCorp Vault (KV v2) using Kubernetes auth.
 #
 # Do not apply until Vault is running, reachable from the cluster, and configured with:
 # - Kubernetes auth at mountPath (default: kubernetes)
 # - A role (below: external-secrets) bound to this service account:
 #     name: external-secrets
 #     namespace: external-secrets
 # - A policy allowing read on the KV path used below (e.g. secret/data/* for path "secret")
 #
 # Adjust server, mountPath, role, and path to match your Vault deployment. If Vault uses TLS
 # with a private CA, set provider.vault.caProvider or caBundle (see README).
 #
 # kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
 ---
 apiVersion: external-secrets.io/v1
 kind: ClusterSecretStore
 metadata:
  name: vault
 spec:
  provider:
    vault:
      server: "http://vault.vault.svc.cluster.local:8200"
      path: secret
      version: v2
      auth:
        kubernetes:
          mountPath: kubernetes
          role: external-secrets
          serviceAccountRef:
            name: external-secrets
            namespace: external-secrets
--- a/clusters/noble/bootstrap/external-secrets/namespace.yaml
+++ b/clusters/noble/bootstrap/external-secrets/namespace.yaml
@@ -1,5 +0,0 @@
 # External Secrets Operator — apply before Helm.
 apiVersion: v1
 kind: Namespace
 metadata:
  name: external-secrets
--- a/clusters/noble/bootstrap/external-secrets/values.yaml
+++ b/clusters/noble/bootstrap/external-secrets/values.yaml
@@ -1,10 +0,0 @@
 # External Secrets Operator — noble
 #
 # helm repo add external-secrets https://charts.external-secrets.io
 # helm repo update
 # kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
 # helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
 #   --version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
 #
 # CRDs are installed by the chart (installCRDs: true). Vault ClusterSecretStore: see README + examples/.
 commonLabels: {}
--- a/clusters/noble/bootstrap/kustomization.yaml
+++ b/clusters/noble/bootstrap/kustomization.yaml
@@ -8,13 +8,9 @@ resources:
  - kube-prometheus-stack/namespace.yaml
  - loki/namespace.yaml
  - fluent-bit/namespace.yaml
-  - sealed-secrets/namespace.yaml
+  - newt/namespace.yaml
  - external-secrets/namespace.yaml
  - vault/namespace.yaml
  - kyverno/namespace.yaml
  - velero/namespace.yaml
  - velero/longhorn-volumesnapshotclass.yaml
  - headlamp/namespace.yaml
  - grafana-loki-datasource/loki-datasource.yaml
  - vault/unseal-cronjob.yaml
  - vault/cilium-network-policy.yaml
--- a/clusters/noble/bootstrap/kyverno/policies-values.yaml
+++ b/clusters/noble/bootstrap/kyverno/policies-values.yaml
@@ -35,7 +35,6 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
          - kube-node-lease
          - argocd
          - cert-manager
          - external-secrets
          - headlamp
          - kyverno
          - logging
@@ -44,9 +43,7 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
          - metallb-system
          - monitoring
          - newt
          - sealed-secrets
          - traefik
          - vault
 policyExclude:
  disallow-capabilities: *kyverno_exclude_infra
--- a/clusters/noble/bootstrap/newt/README.md
+++ b/clusters/noble/bootstrap/newt/README.md
@@ -2,26 +2,24 @@
 This is the **primary** automation path for **public** hostnames to workloads in this cluster (it **replaces** in-cluster ExternalDNS). [Newt](https://github.com/fosrl/newt) is the on-prem agent that connects your cluster to a **Pangolin** site (WireGuard tunnel). The [Fossorial Helm chart](https://github.com/fosrl/helm-charts) deploys one or more instances.
-**Secrets:** Never commit endpoint, Newt ID, or Newt secret. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret.
+**Secrets:** Never commit endpoint, Newt ID, or Newt secret in **plain** YAML. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret.
 ## 1. Create the Secret
 Keys must match `values.yaml` (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`).
-### Option A — Sealed Secret (safe for GitOps)
+### Option A — SOPS (safe for GitOps)
-With the [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) controller installed (`clusters/noble/bootstrap/sealed-secrets/`), generate a `SealedSecret` from your workstation (rotate credentials in Pangolin first if they were exposed):
+Encrypt a normal **`Secret`** with [Mozilla SOPS](https://github.com/getsops/sops) and **age** (see **`clusters/noble/secrets/README.md`** and **`.sops.yaml`**). The repo includes an encrypted example at **`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`** — edit with `sops` after exporting **`SOPS_AGE_KEY_FILE`** to your **`age-key.txt`**, or create a new file and encrypt it.
 ```bash
-chmod +x clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
+export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
-export PANGOLIN_ENDPOINT='https://pangolin.pcenicni.dev'
+sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
-export NEWT_ID='YOUR_NEWT_ID'
+# then:
-export NEWT_SECRET='YOUR_NEWT_SECRET'
+sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
 ./clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
 kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
 ```
-Commit only the `.sealedsecret.yaml` file, not plain `Secret` YAML.
+**Ansible** (`noble.yml`) applies all **`clusters/noble/secrets/*.yaml`** automatically when **`age-key.txt`** exists at the repo root.
 ### Option B — Imperative Secret (not in git)
--- a/clusters/noble/bootstrap/sealed-secrets/README.md
+++ b/clusters/noble/bootstrap/sealed-secrets/README.md
@@ -1,50 +0,0 @@
 # Sealed Secrets (noble)
 Encrypts `Secret` manifests so they can live in git; the controller decrypts **SealedSecret** resources into **Secret**s in-cluster.
 - **Chart:** `sealed-secrets/sealed-secrets` **2.18.4** (app **0.36.1**)
 - **Namespace:** `sealed-secrets`
 ## Install
 ```bash
 helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
 helm repo update
 kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
 helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
  --version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
 ```
 ## Workstation: `kubeseal`
 Install a **kubeseal** build compatible with the controller (match **app** minor, e.g. **0.36.x** for **0.36.1**). Examples:
 - **Homebrew:** `brew install kubeseal` (check `kubeseal --version` against the chart’s `image.tag` in `helm show values`).
 - **GitHub releases:** [bitnami-labs/sealed-secrets](https://github.com/bitnami-labs/sealed-secrets/releases)
 Fetch the cluster’s public seal cert (once per kube context):
 ```bash
 kubeseal --fetch-cert > /tmp/noble-sealed-secrets.pem
 ```
 Create a sealed secret from a normal secret manifest:
 ```bash
 kubectl create secret generic example --from-literal=foo=bar --dry-run=client -o yaml \
  | kubeseal --cert /tmp/noble-sealed-secrets.pem -o yaml > example-sealedsecret.yaml
 ```
 Commit `example-sealedsecret.yaml`; apply it with `kubectl apply -f`. The controller creates the **Secret** in the same namespace as the **SealedSecret**.
 **Noble example:** `examples/kubeseal-newt-pangolin-auth.sh` (Newt / Pangolin tunnel credentials).
 ## Backup the sealing key
 If the controller’s private key is lost, existing sealed files cannot be decrypted on a new cluster. Back up the key secret after install:
 ```bash
 kubectl get secret -n sealed-secrets -l sealedsecrets.bitnami.com/sealed-secrets-key=active -o yaml > sealed-secrets-key-backup.yaml
 ```
 Store `sealed-secrets-key-backup.yaml` in a safe offline location (not in public git).
--- a/clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
+++ b/clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
@@ -1,19 +0,0 @@
 #!/usr/bin/env bash
 # Emit a SealedSecret for newt-pangolin-auth (namespace newt).
 # Prerequisites: sealed-secrets controller running; kubeseal client (same minor as controller).
 # Rotate Pangolin/Newt credentials in the UI first if they were exposed, then set env vars and run:
 #
 #   export PANGOLIN_ENDPOINT='https://pangolin.example.com'
 #   export NEWT_ID='...'
 #   export NEWT_SECRET='...'
 #   ./kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
 #   kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
 #
 set -euo pipefail
 kubectl apply -f "$(dirname "$0")/../../newt/namespace.yaml" >/dev/null 2>&1 || true
 kubectl -n newt create secret generic newt-pangolin-auth \
  --dry-run=client \
  --from-literal=PANGOLIN_ENDPOINT="${PANGOLIN_ENDPOINT:?}" \
  --from-literal=NEWT_ID="${NEWT_ID:?}" \
  --from-literal=NEWT_SECRET="${NEWT_SECRET:?}" \
  -o yaml | kubeseal -o yaml
--- a/clusters/noble/bootstrap/sealed-secrets/namespace.yaml
+++ b/clusters/noble/bootstrap/sealed-secrets/namespace.yaml
@@ -1,5 +0,0 @@
 # Sealed Secrets controller — apply before Helm.
 apiVersion: v1
 kind: Namespace
 metadata:
  name: sealed-secrets
--- a/clusters/noble/bootstrap/sealed-secrets/values.yaml
+++ b/clusters/noble/bootstrap/sealed-secrets/values.yaml
@@ -1,18 +0,0 @@
 # Sealed Secrets — noble (Git-encrypted Secret workflow)
 #
 # helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
 # helm repo update
 # kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
 # helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
 #   --version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
 #
 # Client: install kubeseal (same minor as controller — see README).
 # Defaults are sufficient for the lab; override here if you need key renewal, resources, etc.
 #
 # GitOps pattern: create Secrets only via SealedSecret (or External Secrets + Vault).
 # Example (Newt): clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
 # Backup the controller's sealing key: kubectl -n sealed-secrets get secret sealed-secrets-key -o yaml
 #
 # Talos cluster secrets (bootstrap token, cluster secret, certs) belong in talhelper talsecret /
 # SOPS — not Sealed Secrets. See talos/README.md.
 commonLabels: {}
--- a/clusters/noble/bootstrap/vault/README.md
+++ b/clusters/noble/bootstrap/vault/README.md
@@ -1,162 +0,0 @@
 # HashiCorp Vault (noble)
 Standalone Vault with **file** storage on a **Longhorn** PVC (`server.dataStorage`). The listener uses **HTTP** (`global.tlsDisable: true`) for in-cluster use; add TLS at the listener when exposing outside the cluster.
 - **Chart:** `hashicorp/vault` **0.32.0** (Vault **1.21.2**)
 - **Namespace:** `vault`
 ## Install
 ```bash
 helm repo add hashicorp https://helm.releases.hashicorp.com
 helm repo update
 kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
 helm upgrade --install vault hashicorp/vault -n vault \
  --version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
 ```
 Verify:
 ```bash
 kubectl -n vault get pods,pvc,svc
 kubectl -n vault exec -i sts/vault -- vault status
 ```
 ## Cilium network policy (Phase G)
 After **Cilium** is up, optionally restrict HTTP access to the Vault server pods (**TCP 8200**) to **`external-secrets`** and same-namespace clients:
 ```bash
 kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
 ```
 If you add workloads in other namespaces that call Vault, extend **`ingress`** in that manifest.
 ## Initialize and unseal (first time)
 From a workstation with `kubectl` (or `kubectl exec` into any pod with `vault` CLI):
 ```bash
 kubectl -n vault exec -i sts/vault -- vault operator init -key-shares=1 -key-threshold=1
 ```
 **Lab-only:** `-key-shares=1 -key-threshold=1` keeps a single unseal key. For stronger Shamir splits, use more shares and store them safely.
 Save the **Unseal Key** and **Root Token** offline. Then unseal once:
 ```bash
 kubectl -n vault exec -i sts/vault -- vault operator unseal
 # paste unseal key
 ```
 Or create the Secret used by the optional CronJob and apply it:
 ```bash
 kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
 kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
 ```
 The CronJob runs every minute and unseals if Vault is sealed and the Secret is present.
 ## Auto-unseal note
 Vault **OSS** auto-unseal uses cloud KMS (AWS, GCP, Azure, OCI), **Transit** (another Vault), etc. There is no first-class “Kubernetes Secret” seal. This repo uses an optional **CronJob** as a **lab** substitute. Production clusters should use a supported seal backend.
 ## Kubernetes auth (External Secrets / ClusterSecretStore)
 **One-shot:** from the repo root, `export KUBECONFIG=talos/kubeconfig` and `export VAULT_TOKEN=…`, then run **`./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`** (idempotent). Then **`kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** on its own line (shell comments **`# …`** on the same line are parsed as extra `kubectl` args and break `apply`). **`kubectl get clustersecretstore vault`** should show **READY=True** after a few seconds.
 Run these **from your workstation** (needs `kubectl`; no local `vault` binary required). Use a **short-lived admin token** or the root token **only in your shell** — do not paste tokens into logs or chat.
 **1. Enable the auth method** (skip if already done):
 ```bash
 kubectl -n vault exec -it sts/vault -- sh -c '
  export VAULT_ADDR=http://127.0.0.1:8200
  export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
  vault auth enable kubernetes
 '
 ```
 **2. Configure `auth/kubernetes`** — the API **issuer** must match the `iss` claim on service account JWTs. With **kube-vip** / a custom API URL, discover it from the cluster (do not assume `kubernetes.default`):
 ```bash
 ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
 REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
 CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
 ```
 Then apply config **inside** the Vault pod (environment variables are passed in with `env` so quoting stays correct):
 ```bash
 export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
 export ISSUER REVIEWER CA_B64
 kubectl -n vault exec -i sts/vault -- env \
  VAULT_ADDR=http://127.0.0.1:8200 \
  VAULT_TOKEN="$VAULT_TOKEN" \
  CA_B64="$CA_B64" \
  REVIEWER="$REVIEWER" \
  ISSUER="$ISSUER" \
  sh -ec '
  echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
  vault write auth/kubernetes/config \
    kubernetes_host="https://kubernetes.default.svc:443" \
    kubernetes_ca_cert=@/tmp/k8s-ca.crt \
    token_reviewer_jwt="$REVIEWER" \
    issuer="$ISSUER"
 '
 ```
 **3. KV v2** at path `secret` (skip if already enabled):
 ```bash
 kubectl -n vault exec -it sts/vault -- sh -c '
  export VAULT_ADDR=http://127.0.0.1:8200
  export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
  vault secrets enable -path=secret kv-v2
 '
 ```
 **4. Policy + role** for the External Secrets operator SA (`external-secrets` / `external-secrets`):
 ```bash
 kubectl -n vault exec -it sts/vault -- sh -c '
  export VAULT_ADDR=http://127.0.0.1:8200
  export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
  vault policy write external-secrets - <<EOF
 path "secret/data/*" {
  capabilities = ["read", "list"]
 }
 path "secret/metadata/*" {
  capabilities = ["read", "list"]
 }
 EOF
  vault write auth/kubernetes/role/external-secrets \
    bound_service_account_names=external-secrets \
    bound_service_account_namespaces=external-secrets \
    policies=external-secrets \
    ttl=24h
 '
 ```
 **5. Apply** **`clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** if you have not already, then verify:
 ```bash
 kubectl describe clustersecretstore vault
 ```
 See also [Kubernetes auth](https://developer.hashicorp.com/vault/docs/auth/kubernetes#configuration).
 ## TLS and External Secrets
 `values.yaml` disables TLS on the Vault listener. The **`ClusterSecretStore`** example uses **`http://vault.vault.svc.cluster.local:8200`**. If you enable TLS on the listener, switch the URL to **`https://`** and configure **`caBundle`** or **`caProvider`** on the store.
 ## UI
 Port-forward:
 ```bash
 kubectl -n vault port-forward svc/vault-ui 8200:8200
 ```
 Open `http://127.0.0.1:8200` and log in with the root token (rotate for production workflows).
--- a/clusters/noble/bootstrap/vault/cilium-network-policy.yaml
+++ b/clusters/noble/bootstrap/vault/cilium-network-policy.yaml
@@ -1,40 +0,0 @@
 # CiliumNetworkPolicy — restrict who may reach Vault HTTP listener (8200).
 # Apply after Cilium is healthy: kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
 #
 # Ingress-only policy: egress from Vault is unchanged (Kubernetes auth needs API + DNS).
 # Extend ingress rules if other namespaces must call Vault (e.g. app workloads).
 #
 # Ref: https://docs.cilium.io/en/stable/security/policy/language/
 ---
 apiVersion: cilium.io/v2
 kind: CiliumNetworkPolicy
 metadata:
  name: vault-http-ingress
  namespace: vault
 spec:
  endpointSelector:
    matchLabels:
      app.kubernetes.io/name: vault
      component: server
  ingress:
    - fromEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": external-secrets
      toPorts:
        - ports:
            - port: "8200"
              protocol: TCP
    - fromEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": traefik
      toPorts:
        - ports:
            - port: "8200"
              protocol: TCP
    - fromEndpoints:
        - matchLabels:
            "k8s:io.kubernetes.pod.namespace": vault
      toPorts:
        - ports:
            - port: "8200"
              protocol: TCP
--- a/clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
+++ b/clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
@@ -1,77 +0,0 @@
 #!/usr/bin/env bash
 # Configure Vault Kubernetes auth + KV v2 + policy/role for External Secrets Operator.
 # Requires: kubectl (cluster access), jq optional (openid issuer); Vault reachable via sts/vault.
 #
 # Usage (from repo root):
 #   export KUBECONFIG=talos/kubeconfig   # or your path
 #   export VAULT_TOKEN='…'               # root or admin token — never commit
 #   ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
 #
 # Then: kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
 # Verify: kubectl describe clustersecretstore vault
 set -euo pipefail
 : "${VAULT_TOKEN:?Set VAULT_TOKEN to your Vault root or admin token}"
 ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
 REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
 CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
 kubectl -n vault exec -i sts/vault -- env \
  VAULT_ADDR=http://127.0.0.1:8200 \
  VAULT_TOKEN="$VAULT_TOKEN" \
  sh -ec '
    set -e
    vault auth list >/tmp/vauth.txt
    grep -q "^kubernetes/" /tmp/vauth.txt || vault auth enable kubernetes
  '
 kubectl -n vault exec -i sts/vault -- env \
  VAULT_ADDR=http://127.0.0.1:8200 \
  VAULT_TOKEN="$VAULT_TOKEN" \
  CA_B64="$CA_B64" \
  REVIEWER="$REVIEWER" \
  ISSUER="$ISSUER" \
  sh -ec '
    echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
    vault write auth/kubernetes/config \
      kubernetes_host="https://kubernetes.default.svc:443" \
      kubernetes_ca_cert=@/tmp/k8s-ca.crt \
      token_reviewer_jwt="$REVIEWER" \
      issuer="$ISSUER"
  '
 kubectl -n vault exec -i sts/vault -- env \
  VAULT_ADDR=http://127.0.0.1:8200 \
  VAULT_TOKEN="$VAULT_TOKEN" \
  sh -ec '
    set -e
    vault secrets list >/tmp/vsec.txt
    grep -q "^secret/" /tmp/vsec.txt || vault secrets enable -path=secret kv-v2
  '
 kubectl -n vault exec -i sts/vault -- env \
  VAULT_ADDR=http://127.0.0.1:8200 \
  VAULT_TOKEN="$VAULT_TOKEN" \
  sh -ec '
    vault policy write external-secrets - <<EOF
 path "secret/data/*" {
  capabilities = ["read", "list"]
 }
 path "secret/metadata/*" {
  capabilities = ["read", "list"]
 }
 EOF
    vault write auth/kubernetes/role/external-secrets \
      bound_service_account_names=external-secrets \
      bound_service_account_namespaces=external-secrets \
      policies=external-secrets \
      ttl=24h
  '
 echo "Done. Issuer used: $ISSUER"
 echo ""
 echo "Next (each command on its own line — do not paste # comments after kubectl):"
 echo "  kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
 echo "  kubectl get clustersecretstore vault"
--- a/clusters/noble/bootstrap/vault/namespace.yaml
+++ b/clusters/noble/bootstrap/vault/namespace.yaml
@@ -1,5 +0,0 @@
 # HashiCorp Vault — apply before Helm.
 apiVersion: v1
 kind: Namespace
 metadata:
  name: vault
--- a/clusters/noble/bootstrap/vault/unseal-cronjob.yaml
+++ b/clusters/noble/bootstrap/vault/unseal-cronjob.yaml
@@ -1,63 +0,0 @@
 # Optional lab auto-unseal: applies after Vault is initialized and Secret `vault-unseal-key` exists.
 #
 # 1) vault operator init -key-shares=1 -key-threshold=1  (lab only — single key)
 # 2) kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
 # 3) kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
 #
 # OSS Vault has no Kubernetes/KMS seal; this CronJob runs vault operator unseal when the server is sealed.
 # Protect the Secret with RBAC; prefer cloud KMS auto-unseal for real environments.
 ---
 apiVersion: batch/v1
 kind: CronJob
 metadata:
  name: vault-auto-unseal
  namespace: vault
 spec:
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 3
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          securityContext:
            runAsNonRoot: true
            runAsUser: 100
            runAsGroup: 1000
            seccompProfile:
              type: RuntimeDefault
          containers:
            - name: unseal
              image: hashicorp/vault:1.21.2
              imagePullPolicy: IfNotPresent
              securityContext:
                allowPrivilegeEscalation: false
                capabilities:
                  drop:
                    - ALL
              env:
                - name: VAULT_ADDR
                  value: http://vault.vault.svc:8200
              command:
                - /bin/sh
                - -ec
                - |
                  test -f /secrets/key || exit 0
                  status="$(vault status -format=json 2>/dev/null || true)"
                  echo "$status" | grep -q '"initialized":true' || exit 0
                  echo "$status" | grep -q '"sealed":false' && exit 0
                  vault operator unseal "$(cat /secrets/key)"
              volumeMounts:
                - name: unseal
                  mountPath: /secrets
                  readOnly: true
          volumes:
            - name: unseal
              secret:
                secretName: vault-unseal-key
                optional: true
                items:
                  - key: key
                    path: key
--- a/clusters/noble/bootstrap/vault/values.yaml
+++ b/clusters/noble/bootstrap/vault/values.yaml
@@ -1,62 +0,0 @@
 # HashiCorp Vault — noble (standalone, file storage on Longhorn; TLS disabled on listener for in-cluster HTTP).
 #
 # helm repo add hashicorp https://helm.releases.hashicorp.com
 # helm repo update
 # kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
 # helm upgrade --install vault hashicorp/vault -n vault \
 #   --version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
 #
 # Post-install: initialize, store unseal key in Secret, apply optional unseal CronJob — see README.md
 #
 global:
  tlsDisable: true
 injector:
  enabled: true
 server:
  enabled: true
  dataStorage:
    enabled: true
    size: 10Gi
    storageClass: longhorn
    accessMode: ReadWriteOnce
  ha:
    enabled: false
  standalone:
    enabled: true
    config: |
      ui = true
      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }
      storage "file" {
        path = "/vault/data"
      }
  # Allow pod Ready before init/unseal so Helm --wait succeeds (see Vault /v1/sys/health docs).
  readinessProbe:
    enabled: true
    path: "/v1/sys/health?uninitcode=204&sealedcode=204&standbyok=true"
    port: 8200
  # LAN: TLS terminates at Traefik + cert-manager; listener stays HTTP (global.tlsDisable).
  ingress:
    enabled: true
    ingressClassName: traefik
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    hosts:
      - host: vault.apps.noble.lab.pcenicni.dev
        paths: []
    tls:
      - secretName: vault-apps-noble-tls
        hosts:
          - vault.apps.noble.lab.pcenicni.dev
 ui:
  enabled: true
--- a/clusters/noble/secrets/README.md
+++ b/clusters/noble/secrets/README.md
@@ -0,0 +1,38 @@
 # SOPS-encrypted cluster secrets (noble)
 Secrets that belong in git are stored here as **Mozilla SOPS** files encrypted with [age](https://github.com/FiloSottile/age). The matching **private** key lives in **`age-key.txt`** at the repository root (gitignored — create with `age-keygen -o age-key.txt` and add the public key to **`.sops.yaml`** if you rotate keys).
 **Migrating from an older cluster** that ran **Vault**, **Sealed Secrets**, or **External Secrets Operator:** uninstall those Helm releases (`helm uninstall vault -n vault`, etc.), delete their namespaces if empty, and export any secrets you still need into plain **`Secret`** YAML here, then encrypt with **`sops`** before committing.
 ## Prerequisites
 - [sops](https://github.com/getsops/sops) and **age** on the machine that encrypts or applies secrets.
 ## Edit or create a Secret
 ```bash
 export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
 # Create a new file from a template, then encrypt:
 sops clusters/noble/secrets/example.secret.yaml
 # Or edit an existing encrypted file (opens decrypted in $EDITOR):
 sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
 ```
 ## Apply to the cluster
 ```bash
 export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
 export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
 sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
 ```
 **Ansible** (`noble.yml`) runs the same decrypt-and-apply step for every `*.yaml` in this directory when **`age-key.txt`** exists and **`noble_apply_sops_secrets`** is true (see `ansible/group_vars/all.yml`).
 ## Files
 | File | Purpose |
 |------|---------|
 | `newt-pangolin-auth.secret.yaml` | Pangolin tunnel credentials for [Newt](../../bootstrap/newt/README.md) (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`). Replace placeholders and re-encrypt before relying on them. |
--- a/clusters/noble/secrets/newt-pangolin-auth.secret.yaml
+++ b/clusters/noble/secrets/newt-pangolin-auth.secret.yaml
@@ -0,0 +1,30 @@
 apiVersion: ENC[AES256_GCM,data:FaA=,iv:EsqIdZmNS4hfzwCZ0gL7Q5Czaz8Bii3jWFu60lKmgVo=,tag:tfr4yUuTiH4s+ufYW/dpCA==,type:str]
 kind: ENC[AES256_GCM,data:ozpTcG9F,iv:Q1EZ896Plhyz2qM4JJRnBf940kbVLSwyIIPUcDGBZFA=,tag:1bWEgI+I4Ni5J70MlohYdA==,type:str]
 metadata:
    name: ENC[AES256_GCM,data:moXbGuT6ZOGhgVUBNcpHeLZQ,iv:1WDtxT/Et/6lxx1Mj93CQME8o0lhzxnBMkdSqP/n3R0=,tag:v+iqfE8tzCx8ZOMUW7OyEA==,type:str]
    namespace: ENC[AES256_GCM,data:33/AMg==,iv:M0GvB/70nHh4MVR1saZy1pGY8IFFzkzGdJl4szHJbCI=,tag:0+1LX/EnkAP0FZ6ARKZNAA==,type:str]
 type: ENC[AES256_GCM,data:3io5utU1,iv:QqMDNL/R8SR7TC9mwDdDd3V6VOo+csgeiZCr2AdOZjw=,tag:/KSMy+vNz7Qj/I463eG0LQ==,type:str]
 stringData:
    PANGOLIN_ENDPOINT: ENC[AES256_GCM,data:a/2QTnGYnNXGxNm8QSVTKC6I+r88J1m1CdMmTA==,iv:L2LvLD7IRX8wdAzALAWQ2ojB2OjWDIcVKrdi/lSvZFY=,tag:ALffRF9bncxA8CExSaRmHA==,type:str]
    NEWT_ID: ENC[AES256_GCM,data:Xfe8QvBdX62CciYXYwMfJAzIE/0=,iv:tA+FJ93tsjJ29L3bSxNAEooiKPMc+5pa00EpQ2cJkho=,tag:auiR/zQjnsmyllXbSJf3KA==,type:str]
    NEWT_SECRET: ENC[AES256_GCM,data:XY8XZOkZ+GpnjljbvtaH2oGJpDoZ47fN,iv:+J5sb7saqbVwHEyemx3CUSsdKArubRdPCLGbT09sFLM=,tag:zUowv8I1CaWZH+KLYOwKYw==,type:str]
 sops:
    kms: []
    gcp_kms: []
    azure_kv: []
    hc_vault: []
    age:
        - recipient: age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn
          enc: |
            -----BEGIN AGE ENCRYPTED FILE-----
            YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB0RWppdWxZUEYzc2I2TURi
            dm1pUzVaNDA4YldsWkFJODl1MWZ6MXFxWnhjCnVtU1VEQnJqbTI5M0hWM2FCaVlS
            aXprTm42bTlldUVHMmxpUUJiWEVhcXcKLS0tIGNLVnNtNDdMQ0VVeDV1N29nOW9F
            clhLa2tPdWtRMWYzc2YrR0hSQXczTlUK6hYj4HxQvu6Kqn/Ki+cYv9x5nvolyGqQ
            N4g9z+t6orT6MYseWPf0uyovC/5iOOC6z/2exVe7/0rYo7ZOFm6dYQ==
            -----END AGE ENCRYPTED FILE-----
    lastmodified: "2026-03-29T23:37:33Z"
    mac: ENC[AES256_GCM,data:uKtdqJhwE4HLCenHH+RG8O2yfVIcGbiXznL9ouAXhDLnQh/ksgeczr2fyyn9hs/JhCozAqRrF8vnYZsIdfG1DQfHjXn6Ro6gzYC0YR+gvFU8Mz9uPdVX3HYjUrzKJ5GhhBami0USZtLdGKOGgFDYmFoDsD/PmMXLUol8qJdW8Uk=,iv:rIfQI17+3vNBB1n//D7Wnl/SLWFjV0pgZDteumlS2f8=,tag:xibCfJdZQS+aB75drmY1VA==,type:str]
    pgp: []
    unencrypted_suffix: _unencrypted
    version: 3.9.3
--- a/docs/Racks.md
+++ b/docs/Racks.md
@@ -0,0 +1,169 @@
 # Physical racks — Noble lab (10")
 This page is a **logical rack layout** for the **noble** Talos lab: **three 10" (half-width) racks**, how **rack units (U)** are used, and **Ethernet** paths on **`192.168.50.0/24`**. Node names and IPs match [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) and [`docs/architecture.md`](architecture.md).
 ## Legend
 | Symbol | Meaning |
 |--------|---------|
 | `█` / filled cell | Equipment occupying that **1U** |
 | `░` | Reserved / future use |
 | `·` | Empty |
 | `━━` | Copper to LAN switch |
 **Rack unit numbering:** **U increases upward** (U1 = bottom of rack, like ANSI/EIA). **Slot** in the diagrams is **top → bottom** reading order for a quick visual scan.
 ### Three racks at a glance
 Read **top → bottom** (first row = top of rack).
 | Primary (10") | Storage B (10") | Rack C (10") |
 |-----------------|-----------------|--------------|
 | Fiber ONT | Mac Mini | *empty* |
 | UniFi Fiber Gateway | NAS | *empty* |
 | Patch panel | JBOD | *empty* |
 | 2.5 GbE ×8 PoE switch | *empty* | *empty* |
 | Raspberry Pi cluster | *empty* | *empty* |
 | **helium** (Talos) | *empty* | *empty* |
 | **neon** (Talos) | *empty* | *empty* |
 | **argon** (Talos) | *empty* | *empty* |
 | **krypton** (Talos) | *empty* | *empty* |
 **Connectivity:** Primary rack gear shares **one L2** (`192.168.50.0/24`). Storage B and Rack C link the same way when cabled (e.g. **Ethernet** to the PoE switch, **VPN** or flat LAN per your design).
 ---
 ## Rack A — LAN aggregation (10" × 12U)
 Dedicated to **Layer-2 access** and cable home runs. All cluster nodes plug into this switch (or into a downstream switch that uplinks here).
 ```
  TOP OF RACK
  ┌────────────────────────────────────────┐
  │ Slot 1  ········· empty ·············· │  12U
  │ Slot 2  ········· empty ·············· │  11U
  │ Slot 3  ········· empty ·············· │  10U
  │ Slot 4  ········· empty ·············· │   9U
  │ Slot 5  ········· empty ·············· │   8U
  │ Slot 6  ········· empty ·············· │   7U
  │ Slot 7  ░░░░░░░ optional PDU ░░░░░░░░ │   6U
  │ Slot 8  █████ 1U cable manager ██████ │   5U
  │ Slot 9  █████ 1U patch panel █████████ │   4U
  │ Slot10  ███ 8-port managed switch ████ │   3U  ← LAN L2 spine
  │ Slot11  ········· empty ·············· │   2U
  │ Slot12  ········· empty ·············· │   1U
  └────────────────────────────────────────┘
  BOTTOM
 ```
 **Network role:** Every node NIC → **switch access port** → same **VLAN / flat LAN** as documented; **kube-vip** VIP **`192.168.50.230`**, **MetalLB** **`192.168.50.210`–`229`**, **Traefik** **`192.168.50.211`** are **logical** on node IPs (no extra hardware).
 ---
 ## Rack B — Control planes (10" × 12U)
 Three **Talos control-plane** nodes (**scheduling allowed** on CPs per `talconfig.yaml`).
 ```
  TOP OF RACK
  ┌────────────────────────────────────────┐
  │ Slot 1  ········· empty ·············· │  12U
  │ Slot 2  ········· empty ·············· │  11U
  │ Slot 3  ········· empty ·············· │  10U
  │ Slot 4  ········· empty ·············· │   9U
  │ Slot 5  ········· empty ·············· │   8U
  │ Slot 6  ········· empty ·············· │   7U
  │ Slot 7  ········· empty ·············· │   6U
  │ Slot 8  █ neon  control-plane .20 ████ │   5U
  │ Slot 9  █ argon control-plane .30 ███ │   4U
  │ Slot10  █ krypton control-plane .40 ██ │   3U  (kube-vip VIP .230)
  │ Slot11  ········· empty ·············· │   2U
  │ Slot12  ········· empty ·············· │   1U
  └────────────────────────────────────────┘
  BOTTOM
 ```
 ---
 ## Rack C — Worker (10" × 12U)
 Single **worker** node; **Longhorn** data disk is **local** to each node (see `talconfig.yaml`); no separate NAS in this diagram.
 ```
  TOP OF RACK
  ┌────────────────────────────────────────┐
  │ Slot 1  ········· empty ·············· │  12U
  │ Slot 2  ········· empty ·············· │  11U
  │ Slot 3  ········· empty ·············· │  10U
  │ Slot 4  ········· empty ·············· │   9U
  │ Slot 5  ········· empty ·············· │   8U
  │ Slot 6  ········· empty ·············· │   7U
  │ Slot 7  ░░░░░░░ spare / future ░░░░░░░░ │   6U
  │ Slot 8  ········· empty ·············· │   5U
  │ Slot 9  ········· empty ·············· │   4U
  │ Slot10  ███ helium  worker  .10 █████ │   3U
  │ Slot11  ········· empty ·············· │   2U
  │ Slot12  ········· empty ·············· │   1U
  └────────────────────────────────────────┘
  BOTTOM
 ```
 ---
 ## Space summary
 | System | Rack | Approx. U | IP | Role |
 |--------|------|-----------|-----|------|
 | LAN switch | A | 1U | — | All nodes on `192.168.50.0/24` |
 | Patch / cable mgmt | A | 2× 1U | — | Physical plant |
 | **neon** | B | 1U | `192.168.50.20` | control-plane + schedulable |
 | **argon** | B | 1U | `192.168.50.30` | control-plane + schedulable |
 | **krypton** | B | 1U | `192.168.50.40` | control-plane + schedulable |
 | **helium** | C | 1U | `192.168.50.10` | worker |
 Adjust **empty vs. future** rows if your chassis are **2U** or on **shelves** — scale the `█` blocks accordingly.
 ---
 ## Network connections
 All cluster nodes are on **one flat LAN**. **kube-vip** floats **`192.168.50.230:6443`** across the three control-plane hosts on **`ens18`** (see cluster bootstrap docs).
 ```mermaid
 flowchart TB
  subgraph RACK_A["Rack A — 10\""]
    SW["Managed switch<br/>192.168.50.0/24 L2"]
    PP["Patch / cable mgmt"]
    SW --- PP
  end
  subgraph RACK_B["Rack B — 10\""]
    N["neon :20"]
    A["argon :30"]
    K["krypton :40"]
  end
  subgraph RACK_C["Rack C — 10\""]
    H["helium :10"]
  end
  subgraph LOGICAL["Logical (any node holding VIP)"]
    VIP["API VIP 192.168.50.230<br/>kube-vip → apiserver :6443"]
  end
  WAN["Internet / other LANs"] -.->|"router (out of scope)"| SW
  SW <-->|"Ethernet"| N
  SW <-->|"Ethernet"| A
  SW <-->|"Ethernet"| K
  SW <-->|"Ethernet"| H
  N --- VIP
  A --- VIP
  K --- VIP
  WK["Workstation / CI<br/>kubectl, browser"] -->|"HTTPS :6443"| VIP
  WK -->|"L2 (MetalLB .210–.211, any node)"| SW
 ```
 **Ingress path (same LAN):** clients → **`192.168.50.211`** (Traefik) or **`192.168.50.210`** (Argo CD) via **MetalLB** — still **through the same switch** to whichever node advertises the service.
 ---
 ## Related docs
 - Cluster topology and services: [`architecture.md`](architecture.md)
 - Build state and versions: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -8,8 +8,8 @@ This document describes the **noble** Talos lab cluster: node topology, networki
 |---------------|---------|
 | **Subgraph “Cluster”** | Kubernetes cluster boundary (`noble`) |
 | **External / DNS / cloud** | Services outside the data plane (internet, registrar, Pangolin) |
-| **Data store** | Durable data (etcd, Longhorn, Loki, Vault storage) |
+| **Data store** | Durable data (etcd, Longhorn, Loki) |
-| **Secrets / policy** | Secret material, Vault, admission policy |
+| **Secrets / policy** | Secret material (SOPS in git), admission policy |
 | **LB / VIP** | Load balancer, MetalLB assignment, or API VIP |
 ---
@@ -74,7 +74,7 @@ flowchart TB
 ## Platform stack (bootstrap → workloads)
-Order: **Talos** → **Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip** → **Traefik**, **cert-manager** → **Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `vault`, `external-secrets`, `sealed-secrets`, `kyverno`, `newt`, and others as deployed.
+Order: **Talos** → **Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip** → **Traefik**, **cert-manager** → **Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `kyverno`, `newt`, and others as deployed.
 ```mermaid
 flowchart TB
@@ -98,7 +98,7 @@ flowchart TB
    Argo["Argo CD<br/>(optional app-of-apps; platform via Ansible)"]
  end
  subgraph L5["Platform namespaces (examples)"]
-    NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, vault, external-secrets, sealed-secrets,<br/>kyverno, newt, …"]
+    NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, kyverno, newt, …"]
  end
  Talos --> Cilium --> MS
  Cilium --> LH
@@ -149,22 +149,20 @@ flowchart LR
 ## Secrets and policy
-**Sealed Secrets** decrypts `SealedSecret` objects in-cluster. **External Secrets Operator** syncs from **Vault** using **`ClusterSecretStore`** (see [`examples/vault-cluster-secret-store.yaml`](../clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml)). Trust is **cluster → Vault** (ESO calls Vault; Vault does not initiate cluster trust). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
+**Mozilla SOPS** with **age** encrypts plain Kubernetes **`Secret`** manifests under [`clusters/noble/secrets/`](../clusters/noble/secrets/); operators decrypt at apply time (`ansible/playbooks/noble.yml` or `sops -d … | kubectl apply`). The private key is **`age-key.txt`** at the repo root (gitignored). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
 ```mermaid
 flowchart LR
  subgraph Git["Git repo"]
-    SSman["SealedSecret manifests<br/>(optional)"]
+    SM["SOPS-encrypted Secret YAML<br/>clusters/noble/secrets/"]
  end
  subgraph ops["Apply path"]
    SOPS["sops -d + kubectl apply<br/>(or Ansible noble.yml)"]
  end
  subgraph cluster["Cluster"]
    SSC["Sealed Secrets controller<br/>sealed-secrets"]
    ESO["External Secrets Operator<br/>external-secrets"]
    V["Vault<br/>vault namespace<br/>HTTP listener"]
    K["Kyverno + kyverno-policies<br/>PSS baseline Audit"]
  end
-  SSman -->|"encrypted"| SSC -->|"decrypt to Secret"| workloads["Workload Secrets"]
+  SM --> SOPS -->|"plain Secret"| workloads["Workload Secrets"]
  ESO -->|"ClusterSecretStore →"| V
  ESO -->|"sync ExternalSecret"| workloads
  K -.->|"admission / audit<br/>(PSS baseline)"| workloads
 ```
@@ -172,7 +170,7 @@ flowchart LR
 ## Data and storage
-**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **Vault**, **kube-prometheus-stack** PVCs, and **Loki**.
+**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **kube-prometheus-stack** PVCs and **Loki**.
 ```mermaid
 flowchart TB
@@ -183,12 +181,10 @@ flowchart TB
    SC["StorageClass: longhorn (default)"]
  end
  subgraph consumers["Stateful / durable consumers"]
    V["Vault PVC data-vault-0"]
    PGL["kube-prometheus-stack PVCs"]
    L["Loki PVC"]
  end
  UD --> SC
  SC --> V
  SC --> PGL
  SC --> L
 ```
@@ -210,7 +206,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
 | Argo CD | 9.4.17 / app v3.3.6 |
 | kube-prometheus-stack | 82.15.1 |
 | Loki / Fluent Bit | 6.55.0 / 0.56.0 |
-| Sealed Secrets / ESO / Vault | 2.18.4 / 2.2.0 / 0.32.0 |
+| SOPS (client tooling) | see `clusters/noble/secrets/README.md` |
 | Kyverno | 3.7.1 / policies 3.7.1 |
 | Newt | 1.2.0 / app 1.10.1 |
@@ -218,7 +214,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
 ## Narrative
-The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210`–`192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, Loki, and **Vault**. **Secrets** combine **Sealed Secrets** for git-encrypted material, **Vault** with **External Secrets** for dynamic sync, and **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
+The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210`–`192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, and Loki. **Secrets** in git use **SOPS** + **age** under **`clusters/noble/secrets/`**; **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
 ---
--- a/docs/homelab-network.md
+++ b/docs/homelab-network.md
@@ -0,0 +1,100 @@
 # Homelab network inventory
 Single place for **VLANs**, **static addressing**, and **hosts** beside the **noble** Talos cluster. **Proxmox** is the **hypervisor** for the VMs below; **all of those VMs are intended to run on `192.168.1.0/24`** (same broadcast domain as Pi-hole and typical home clients). **Noble** (Talos) stays on **`192.168.50.0/24`** per [`architecture.md`](architecture.md) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) until you change that design.
 ## VLANs (logical)
 | Network | Role |
 |---------|------|
 | **`192.168.1.0/24`** | **Homelab / Proxmox LAN** — **Proxmox host(s)**, **all Proxmox VMs**, **Pi-hole**, **Mac mini**, and other servers that share this VLAN. |
 | **`192.168.50.0/24`** | **Noble Talos** cluster — physical nodes, **kube-vip**, **MetalLB**, Traefik; **not** the Proxmox VM subnet. |
 | **`192.168.60.0/24`** | **DMZ / WAN-facing** — **NPM**, **WebDAV**, **other services** that need WAN access. |
 | **`192.168.40.0/24`** | **Home Assistant** and IoT devices — isolated; record subnet and HA IP in DHCP/router. |
 **Routing / DNS:** Clients and VMs on **`192.168.1.0/24`** reach **noble** services on **`192.168.50.0/24`** via **L3** (router/firewall). **NFS** from OMV (`192.168.1.105`) to **noble** pods uses the **OMV data IP** as the NFS server address from the cluster’s perspective.
 Firewall rules between VLANs are **out of scope** here; document them where you keep runbooks.
 ---
 ## `192.168.50.0/24` — reservations (noble only)
 Do not assign **unrelated** static services on **this** VLAN without checking overlap with MetalLB and kube-vip.
 | Use | Addresses |
 |-----|-----------|
 | Talos nodes | `.10`–`.40` (see [`talos/talconfig.yaml`](../talos/talconfig.yaml)) |
 | MetalLB L2 pool | `.210`–`.229` |
 | Traefik (ingress) | `.211` (typical) |
 | Argo CD | `.210` (typical) |
 | Kubernetes API (kube-vip) | **`.230`** — **must not** be a VM |
 ---
 ## Proxmox VMs (`192.168.1.0/24`)
 All run on **Proxmox**; addresses below use **`192.168.1.0/24`** (same host octet as your earlier `.50.x` / `.60.x` plan, moved into the homelab VLAN). Adjust if your router uses a different numbering scheme.
 Most are **Docker hosts** with multiple apps; treat the **IP** as the **host**, not individual containers.
 | VM ID | Name | IP | Notes |
 |-------|------|-----|--------|
 | 666 | nginxproxymanager | `192.168.1.20` | NPM (edge / WAN-facing role — firewall as you design). |
 | 777 | nginxproxymanager-Lan | `192.168.1.60` | NPM on **internal** homelab LAN. |
 | 100 | Openmediavault | `192.168.1.105` | **NFS** exports for *arr / media paths. |
 | 110 | Monitor | `192.168.1.110` | Uptime Kuma, Peekaping, Tracearr → cluster candidates. |
 | 120 | arr | `192.168.1.120` | *arr stack; media via **NFS** from OMV — see [migration](#arr-stack-nfs-and-kubernetes). |
 | 130 | Automate | `192.168.1.130` | Low use — **candidate to remove** or consolidate. |
 | 140 | general-purpose | `192.168.1.140` | IT tools, Mealie, Open WebUI, SparkyFitness, … |
 | 150 | Media-server | `192.168.1.150` | Jellyfin (test, **NFS** media), ebook server. |
 | 160 | s3 | `192.168.1.170` | Object storage; **merge** into **central S3** on noble per [`shared-data-services.md`](shared-data-services.md) when ready. |
 | 190 | Auth | `192.168.1.190` | **Authentik** → **noble (K8s)** for HA. |
 | 300 | gitea | `192.168.1.203` | On **`.1`**, no overlap with noble **MetalLB `.210`–`.229`** on **`.50`**. |
 | 310 | gitea-nsfw | `192.168.1.204` | |
 | 500 | AMP | `192.168.1.47` | |
 ### Workload detail (what runs where)
 **Auth (190)** — **Authentik** is the main service; moving it to **Kubernetes (noble)** gives you **HA**, rolling upgrades, and backups via your cluster patterns (PVCs, Velero, etc.). Plan **OIDC redirect URLs** and **outposts** (if used) when the **ingress hostname** and paths to **`.50`** services change.
 **Monitor (110)** — **Uptime Kuma**, **Peekaping**, and **Tracearr** are a good fit for the cluster: small state (SQLite or small DBs), **Ingress** via Traefik, and **Longhorn** or a small DB PVC. Migrate **one app at a time** and keep the old VM until DNS and alerts are verified.
 **arr (120)** — **Lidarr, Sonarr, Radarr**, and related *arr* apps; libraries and download paths point at **NFS** from **Openmediavault (100)** at **`192.168.1.105`**. The hard part is **keeping paths, permissions (UID/GID), and download client** wiring while pods move.
 **Automate (130)** — Tools are **barely used**; **decommission**, merge into **general-purpose (140)**, or replace with a **CronJob** / one-shot on the cluster only if something still needs scheduling.
 **general-purpose (140)** — “Daily driver” stack: **IT tools**, **Mealie**, **Open WebUI**, **SparkyFitness**, and similar. **Candidates for gradual moves** to noble; group by **data sensitivity** and **persistence** (Postgres vs SQLite) when you pick order.
 **Media-server (150)** — **Jellyfin** (testing) with libraries on **NFS**; **ebook** server. Treat **Jellyfin** like *arr* for storage: same NFS export and **transcoding** needs (CPU on worker nodes or GPU if you add it). Ebook stack depends on what you run (e.g. Kavita, Audiobookshelf) — note **metadata paths** before moving.
 ### Arr stack, NFS, and Kubernetes
 You do **not** have to move NFS into the cluster: **Openmediavault** on **`192.168.1.105`** can stay the **NFS server** while the *arr* apps run as **Deployments** with **ReadWriteMany** volumes. Noble nodes on **`192.168.50.0/24`** mount NFS using **that IP** (ensure **firewall** allows **NFS** from node IPs to OMV).
 1. **Keep OMV as the single source of exports** — same **export path** (e.g. `/export/media`) from the cluster’s perspective as from the current VM.
 2. **Mount NFS in Kubernetes** — use a **CSI NFS driver** (e.g. **nfs-subdir-external-provisioner** or **csi-driver-nfs**) so each app gets a **PVC** backed by a **subdirectory** of the export, **or** one shared RWX PVC for a common tree if your layout needs it.
 3. **Match POSIX ownership** — set **supplemental groups** or **fsGroup** / **runAsUser** on the pods so Sonarr/Radarr see the same **UID/GID** as today’s Docker setup; fix **squash** settings on OMV if you use `root_squash`.
 4. **Config and DB** — back up each app’s **config volume** (or SQLite files), redeploy with the same **environment**; point **download clients** and **NFS media roots** to the **same logical paths** inside the container.
 5. **Low-risk path** — run **one** *arr* app on the cluster while the rest stay on **VM 120** until imports and downloads behave; then cut DNS/NPM streams over.
 If you prefer **no** NFS from pods, the alternative is **large ReadWriteOnce** disks on Longhorn and **sync** from OMV — usually **more** moving parts than **RWX NFS** for this workload class.
 ---
 ## Other hosts
 | Host | IP | VLAN / network | Notes |
 |------|-----|----------------|--------|
 | **Pi-hole** | `192.168.1.127` | `192.168.1.0/24` | DNS; same VLAN as Proxmox VMs. |
 | **Home Assistant** | *TBD* | **IoT VLAN** | Add reservation when fixed. |
 | **Mac mini** | `192.168.1.155` | `192.168.1.0/24` | Align with **Storage B** in [`Racks.md`](Racks.md) if the same machine. |
 ---
 ## Related docs
 - **Shared Postgres + S3 (centralized):** [`shared-data-services.md`](shared-data-services.md)
 - **VM → noble migration plan:** [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
 - Noble cluster topology and ingress: [`architecture.md`](architecture.md)
 - Physical racks (Primary / Storage B / Rack C): [`Racks.md`](Racks.md)
 - Cluster checklist: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
--- a/docs/migration-vm-to-noble.md
+++ b/docs/migration-vm-to-noble.md
@@ -0,0 +1,121 @@
 # Migration plan: Proxmox VMs → noble (Kubernetes)
 This document is the **default playbook** for moving workloads from **Proxmox VMs** on **`192.168.1.0/24`** into the **noble** Talos cluster on **`192.168.50.0/24`**. Source inventory and per-VM notes: [`homelab-network.md`](homelab-network.md). Cluster facts: [`architecture.md`](architecture.md), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md).
 ---
 ## 1. Scope and principles
 | Principle | Detail |
 |-----------|--------|
 | **One service at a time** | Run the new workload on **noble** while the **VM** stays up; cut over **DNS / NPM** only after checks pass. |
 | **Same container image** | Prefer the **same** upstream image and major version as Docker on the VM to reduce surprises. |
 | **Data moves with a plan** | **Backup** VM volumes or export DB dumps **before** the first deploy to the cluster. |
 | **Ingress on noble** | Internal apps use **Traefik** + **`*.apps.noble.lab.pcenicni.dev`** (or your chosen hostnames) and **MetalLB** (e.g. **`192.168.50.211`**) per [`architecture.md`](architecture.md). |
 | **Cross-VLAN** | Clients on **`.1`** reach services on **`.50`** via **routing**; **firewall** must allow **NFS** from **Talos node IPs** to **OMV `192.168.1.105`** when pods mount NFS. |
 **Not everything must move.** Keep **Openmediavault** (and optionally **NPM**) on VMs if you prefer; the cluster consumes **NFS** and **HTTP** from them.
 ---
 ## 2. Prerequisites (before wave 1)
 1. **Cluster healthy** — `kubectl get nodes`; [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) checklist through ingress and cert-manager as needed.
 2. **Ingress + TLS** — **Traefik** + **cert-manager** working; you can hit a **test Ingress** on the MetalLB IP.
 3. **GitOps / deploy path** — Decide per app: **Helm** under `clusters/noble/apps/`, **Argo CD**, or **Ansible**-applied manifests (match how you manage the rest of noble).
 4. **Secrets** — Plan **Kubernetes Secrets**; for git-stored material, align with **SOPS** (`clusters/noble/secrets/`, `.sops.yaml`).
 5. **Storage** — **Longhorn** default for **ReadWriteOnce** state; for **NFS** (*arr*, Jellyfin), install a **CSI NFS** driver and test a **small RWX PVC** before migrating data-heavy apps.
 6. **Shared data tier (recommended)** — Deploy **centralized PostgreSQL** and **S3-compatible storage** on noble so apps do not each ship their own DB/object store; see [`shared-data-services.md`](shared-data-services.md).
 7. **Firewall** — Rules: **workstation → `192.168.50.230:6443`**; **nodes → OMV NFS ports**; **clients → `192.168.50.211`** (or split-horizon DNS) as you design.
 8. **DNS** — Split-horizon or Pi-hole records for **`*.apps.noble.lab.pcenicni.dev`** → **Traefik** IP **`192.168.50.211`** for LAN clients.
 ---
 ## 3. Standard migration procedure (repeat per app)
 Use this checklist for **each** application (or small group, e.g. one Helm release).
 | Step | Action |
 |------|--------|
 | **A. Discover** | Document **image:tag**, **ports**, **volumes** (host paths), **env vars**, **depends_on** (DB, Redis, NFS path). Export **docker inspect** / **compose** from the VM. |
 | **B. Backup** | Snapshot **Proxmox VM** or backup **volume** / **SQLite** / **DB dump** to offline storage. |
 | **C. Namespace** | Create a **dedicated namespace** (e.g. `monitoring-tools`, `authentik`) or use your house standard. |
 | **D. Deploy** | Add **Deployment** (or **StatefulSet**), **Service**, **Ingress** (class **traefik**), **PVCs**; wire **secrets** from **Secrets** (not literals in git). |
 | **E. Storage** | **Longhorn** PVC for local state; **NFS CSI** PVC for shared media/config paths that must match the VM (see [`homelab-network.md`](homelab-network.md) *arr* section). Prefer **shared Postgres** / **shared S3** per [`shared-data-services.md`](shared-data-services.md) instead of new embedded databases. Match **UID/GID** with `securityContext`. |
 | **F. Smoke test** | `kubectl port-forward` or temporary **Ingress** hostname; log in, run one critical workflow (login, playback, sync). |
 | **G. DNS cutover** | Point **internal DNS** or **NPM** upstream from the **VM IP** to the **new hostname** (Traefik) or **MetalLB IP** + Host header. |
 | **H. Observe** | 24–72 hours: logs, alerts, **Uptime Kuma** (once migrated), backups. |
 | **I. Decommission** | Stop the **container** on the VM (not the whole VM until the **whole** VM is empty). |
 | **J. VM off** | When **no** services remain on that VM, **power off** and archive or delete the VM. |
 **Rollback:** Re-enable the VM service, revert **DNS/NPM** to the old IP, delete or scale the cluster deployment to zero.
 ---
 ## 4. Recommended migration order (phases)
 Order balances **risk**, **dependencies**, and **learning curve**.
 | Phase | Target | Rationale |
 |-------|--------|-----------|
 | **0 — Optional** | **Automate (130)** | Low use: **retire** or replace with **CronJobs**; skip if nothing valuable runs. |
 | **0b — Platform** | **Shared Postgres + S3** on noble | Run **before** or alongside early waves so new deploys use **one DSN** and **one object endpoint**; retire **VM 160** when empty. See [`shared-data-services.md`](shared-data-services.md). |
 | **1 — Observability** | **Monitor (110)** — Uptime Kuma, Peekaping, Tracearr | Small state, validates **Ingress**, **PVCs**, and **alert paths** before auth and media. |
 | **2 — Git** | **gitea (300)**, **gitea-nsfw (310)** | Point at **shared Postgres** + **S3** for attachments; move **repos** with **PVC** + backup restore if needed. |
 | **3 — Object / misc** | **s3 (160)**, **AMP (500)** | **Migrate data** into **central** S3 on cluster, then **decommission** duplicate MinIO on VM **160** if applicable. |
 | **4 — Auth** | **Auth (190)** — **Authentik** | Use **shared Postgres**; update **all OIDC clients** (Gitea, apps, NPM) with **new issuer URLs**; schedule a **maintenance window**. |
 | **5 — Daily apps** | **general-purpose (140)** | Move **one app per release** (Mealie, Open WebUI, …); each app gets its **own database** (and bucket if needed) on the **shared** tiers — not a new Postgres pod per app. |
 | **6 — Media / *arr*** | **arr (120)**, **Media-server (150)** | **NFS** from **OMV**, download clients, **transcoding** — migrate **one *arr*** then Jellyfin/ebook; see NFS bullets in [`homelab-network.md`](homelab-network.md). |
 | **7 — Edge** | **NPM (666/777)** | Often **last**: either keep on Proxmox or replace with **Traefik** + **IngressRoutes** / **Gateway API**; many people keep a **dedicated** reverse proxy VM until parity is proven. |
 **Openmediavault (100)** — Typically **stays** as **NFS** (and maybe backup target) for the cluster; no need to “migrate” the whole NAS into Kubernetes.
 ---
 ## 5. Ingress and reverse proxy
 | Approach | When to use |
 |----------|-------------|
 | **Traefik Ingress** on noble | Default for **internal** HTTPS apps; **cert-manager** for public names you control. |
 | **NPM (VM)** as front door | Point **proxy host** → **Traefik MetalLB IP** or **service name** if you add internal DNS; reduces double-proxy if you **terminate TLS** in one place only. |
 | **Newt / Pangolin** | Public reachability per [`clusters/noble/bootstrap/newt/README.md`](../clusters/noble/bootstrap/newt/README.md); not automatic ExternalDNS. |
 Avoid **two** TLS terminations for the same hostname unless you intend **SSL passthrough** end-to-end.
 ---
 ## 6. Authentik-specific (Auth VM → cluster)
 1. **Backup** Authentik **PostgreSQL** (or embedded DB) and **media** volume from the VM.
 2. Deploy **Helm** (official chart) with **same** Authentik version if possible.
 3. **Restore** DB into **shared cluster Postgres** (recommended) or chart-managed DB — see [`shared-data-services.md`](shared-data-services.md).
 4. Update **issuer URL** in every **OIDC/OAuth** client (Gitea, Grafana, etc.).
 5. Re-test **outposts** (if any) and **redirect URIs** from both **`.1`** and **`.50`** client perspectives.
 6. **Cut over DNS**; then **decommission** VM **190**.
 ---
 ## 7. *arr* and Jellyfin-specific
 Follow the **numbered list** under **“Arr stack, NFS, and Kubernetes”** in [`homelab-network.md`](homelab-network.md). In short: **OMV stays**; **CSI NFS** + **RWX**; **match permissions**; migrate **one app** first; verify **download client** can reach the new pod **IP/DNS** from your download host.
 ---
 ## 8. Validation checklist (per wave)
 - Pods **Ready**, **Ingress** returns **200** / login page.
 - **TLS** valid for chosen hostname.
 - **Persistent data** present (new uploads, DB writes survive pod restart).
 - **Backups** (Velero or app-level) defined for the new location.
 - **Monitoring** / alerts updated (targets, not old VM IP).
 - **Documentation** in [`homelab-network.md`](homelab-network.md) updated (VM retired or marked migrated).
 ---
 ## Related docs
 - **Shared Postgres + S3:** [`shared-data-services.md`](shared-data-services.md)
 - VM inventory and NFS notes: [`homelab-network.md`](homelab-network.md)
 - Noble topology, MetalLB, Traefik: [`architecture.md`](architecture.md)
 - Bootstrap and versions: [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
 - Apps layout: [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md)
--- a/docs/shared-data-services.md
+++ b/docs/shared-data-services.md
@@ -0,0 +1,90 @@
 # Centralized PostgreSQL and S3-compatible storage
 Goal: **one shared PostgreSQL** and **one S3-compatible object store** on **noble**, instead of every app bundling its own database or MinIO. Apps keep **logical isolation** via **per-app databases** / **users** and **per-app buckets** (or prefixes), not separate clusters.
 See also: [`migration-vm-to-noble.md`](migration-vm-to-noble.md), [`homelab-network.md`](homelab-network.md) (VM **160** `s3` today), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) (Velero + S3).
 ---
 ## 1. Why centralize
 | Benefit | Detail |
 |--------|--------|
 | **Operations** | One backup/restore story, one upgrade cadence, one place to tune **IOPS** and **retention**. |
 | **Security** | **Least privilege**: each app gets its own **DB user** and **S3 credentials** scoped to one database or bucket. |
 | **Resources** | Fewer duplicate **Postgres** or **MinIO** sidecars; better use of **Longhorn** or dedicated PVCs for the shared tiers. |
 **Tradeoff:** Shared tiers are **blast-radius** targets — use **backups**, **PITR** where you care, and **NetworkPolicies** so only expected namespaces talk to Postgres/S3.
 ---
 ## 2. PostgreSQL — recommended pattern
 1. **Run Postgres on noble** — Operators such as **CloudNativePG**, **Zalando Postgres operator**, or a well-maintained **Helm** chart with **replicas** + **persistent volumes** (Longhorn).
 2. **One cluster instance, many databases** — For each app: `CREATE DATABASE appname;` and a **dedicated role** with `CONNECT` on that database only (not superuser).
 3. **Connection from apps** — Use a **Kubernetes Service** (e.g. `postgres-platform.platform.svc.cluster.local:5432`) and pass **credentials** via **Secrets** (ideally **SOPS**-encrypted in git).
 4. **Migrations** — Run app **migration** jobs or init containers against the **same** DSN after DB exists.
 **Migrating off SQLite / embedded Postgres**
 - **SQLite → Postgres:** export/import per app (native tools, or **pgloader** where appropriate).
 - **Docker Postgres volume:** `pg_dumpall` or per-DB `pg_dump` → restore into a **new** database on the shared server; **freeze writes** during cutover.
 ---
 ## 3. S3-compatible object storage — recommended pattern
 1. **Run one S3 API on noble** — **MinIO** (common), **Garage**, or **SeaweedFS** S3 layer — with **PVC(s)** or host path for data; **erasure coding** / replicas if the chart supports it and you want durability across nodes.
 2. **Buckets per concern** — e.g. `gitea-attachments`, `velero`, `loki-archive` — not one global bucket unless you enforce **prefix** IAM policies.
 3. **Credentials** — **IAM-style** users limited to **one bucket** (or prefix); **Secrets** reference **access key** / **secret**; never commit keys in plain text.
 4. **Endpoint for pods** — In-cluster: `http://minio.platform.svc.cluster.local:9000` (or TLS inside mesh). Apps use **virtual-hosted** or **path-style** per SDK defaults.
 ### NFS as backing store for S3 on noble
 **Yes.** You can run MinIO (or another S3-compatible server) with its **data directory** on a **ReadWriteMany** volume that is **NFS** — for example the same **Openmediavault** export you already use, mounted via your **NFS CSI** driver (see [`homelab-network.md`](homelab-network.md)).
 | Consideration | Detail |
 |---------------|--------|
 | **Works for homelab** | MinIO stores objects as files under a path; **POSIX** on NFS is enough for many setups. |
 | **Performance** | NFS adds **latency** and shared bandwidth; fine for moderate use, less ideal for heavy multi-tenant throughput. |
 | **Availability** | The **NFS server** (OMV) becomes part of the availability story for object data — plan **backups** and **OMV** health like any dependency. |
 | **Locking / semantics** | Prefer **NFSv4.x**; avoid mixing **NFS** and expectations of **local SSD** (e.g. very chatty small writes). If you see odd behavior, **Longhorn** (block) on a node is the usual next step. |
 | **Layering** | You are stacking **S3 API → file layout → NFS → disk**; that is normal for a lab, just **monitor** space and exports on OMV. |
 **Summary:** NFS-backed PVC for MinIO is **valid** on noble; use **Longhorn** (or local disk) when you need **better IOPS** or want object data **inside** the cluster’s storage domain without depending on OMV for that tier.
 **Migrating off VM 160 (`s3`) or per-app MinIO**
 - **MinIO → MinIO:** `mc mirror` between aliases, or **replication** if you configure it.
 - **Same API:** Any tool speaking **S3** can **sync** buckets before you point apps at the new endpoint.
 **Velero** — Point the **backup location** at the **central** bucket (see cluster Velero docs); avoid a second ad-hoc object store for backups if one cluster bucket is enough.
 ---
 ## 4. Ordering relative to app migrations
 | When | What |
 |------|------|
 | **Early** | Stand up **Postgres** + **S3** with **empty** DBs/buckets; test with **one** non-critical app (e.g. a throwaway deployment). |
 | **Before auth / Git** | **Gitea** and **Authentik** benefit from **managed Postgres** early — plan **DSN** and **bucket** for attachments **before** cutover. |
 | **Ongoing** | New apps **must not** ship embedded **Postgres/MinIO** unless the workload truly requires it (e.g. vendor appliance). |
 ---
 ## 5. Checklist (platform team)
 - [ ] Postgres **Service** DNS name and **TLS** (optional in-cluster) documented.
 - [ ] S3 **endpoint**, **region** string (can be `us-east-1` for MinIO), **TLS** for Ingress if clients are outside the cluster.
 - [ ] **Backup:** scheduled **logical dumps** (Postgres) and **bucket replication** or **object versioning** where needed.
 - [ ] **SOPS** / **External Secrets** pattern for **rotation** without editing app manifests by hand.
 - [ ] **homelab-network.md** updated when **VM 160** is retired or repurposed.
 ---
 ## Related docs
 - VM → cluster migration: [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
 - Inventory (s3 VM): [`homelab-network.md`](homelab-network.md)
 - Longhorn / storage runbook: [`../talos/runbooks/longhorn.md`](../talos/runbooks/longhorn.md)
 - Velero (S3 backup target): [`../clusters/noble/bootstrap/velero/`](../clusters/noble/bootstrap/velero/) (if present)
--- a/komodo/monitor/tracearr/compose.yaml
+++ b/komodo/monitor/tracearr/compose.yaml
@@ -7,7 +7,7 @@
 services:
  tracearr:
-    image: ghcr.io/connorgallopo/tracearr:supervised-nightly
+    image: ghcr.io/connorgallopo/tracearr:latest
    shm_size: 256mb  # Required for PostgreSQL shared memory
    ports:
      - "${PORT:-3000}:3000"
--- a/talos/CLUSTER-BUILD.md
+++ b/talos/CLUSTER-BUILD.md
@@ -4,7 +4,7 @@ This document is the **exported TODO** for the **noble** Talos cluster (4 nodes)
 ## Current state (2026-03-28)
-Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vault **CiliumNetworkPolicy**, **`talos/runbooks/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** backup/restore drill after S3 credentials are set (**`noble_velero_install`**).
+Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (**`talos/runbooks/`**, **SOPS**-encrypted secrets in **`clusters/noble/secrets/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** backup/restore drill after S3 credentials are set (**`noble_velero_install`**).
 - **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
 - **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/bootstrap/cilium/`, phase 1 values).
@@ -15,13 +15,11 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 - **Longhorn** Helm **1.11.1** / app **v1.11.1** — `clusters/noble/bootstrap/longhorn/` (PSA **privileged** namespace, `defaultDataPath` `/var/mnt/longhorn`, `preUpgradeChecker` enabled); **StorageClass** `longhorn` (default); **`nodes.longhorn.io`** all **Ready**; test **PVC** `Bound` on `longhorn`.
 - **Traefik** Helm **39.0.6** / app **v3.6.11** — `clusters/noble/bootstrap/traefik/`; **`Service`** **`LoadBalancer`** **`EXTERNAL-IP` `192.168.50.211`**; **`IngressClass`** **`traefik`** (default). Point **`*.apps.noble.lab.pcenicni.dev`** at **`192.168.50.211`**. MetalLB pool verification was done before replacing the temporary nginx test with Traefik.
 - **cert-manager** Helm **v1.20.0** / app **v1.20.0** — `clusters/noble/bootstrap/cert-manager/`; **`ClusterIssuer`** **`letsencrypt-staging`** and **`letsencrypt-prod`** (**DNS-01** via **Cloudflare** for **`pcenicni.dev`**, Secret **`cloudflare-dns-api-token`** in **`cert-manager`**); ACME email **`certificates@noble.lab.pcenicni.dev`** (edit in manifests if you want a different mailbox).
- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Prefer a **SealedSecret** in git (`kubeseal` — see `clusters/noble/bootstrap/sealed-secrets/examples/`) after rotating credentials if they were exposed. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
+- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Store credentials in git with **SOPS** (`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`, **`age-key.txt`**, **`.sops.yaml`**) — see **`clusters/noble/secrets/README.md`**. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
 - **Argo CD** Helm **9.4.17** / app **v3.3.6** — `clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; app-of-apps root syncs **`clusters/noble/apps/`** (edit **`root-application.yaml`** `repoURL` before applying).
 - **kube-prometheus-stack** — Helm chart **82.15.1** — `clusters/noble/bootstrap/kube-prometheus-stack/` (**namespace** `monitoring`, PSA **privileged** — **node-exporter** needs host mounts); **Longhorn** PVCs for Prometheus, Grafana, Alertmanager; **node-exporter** DaemonSet **4/4**. **Grafana Ingress:** **`https://grafana.apps.noble.lab.pcenicni.dev`** (Traefik **`ingressClassName: traefik`**, **`cert-manager.io/cluster-issuer: letsencrypt-prod`**). **Loki** datasource in Grafana: ConfigMap **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** (sidecar label **`grafana_datasource: "1"`**) — not via **`grafana.additionalDataSources`** in the chart. **`helm upgrade --install` with `--wait` is silent until done** — use **`--timeout 30m`**; Grafana admin: Secret **`kube-prometheus-grafana`**, keys **`admin-user`** / **`admin-password`**.
 - **Loki** + **Fluent Bit** — **`grafana/loki` 6.55.0** SingleBinary + **filesystem** on **Longhorn** (`clusters/noble/bootstrap/loki/`); **`loki.auth_enabled: false`**; **`chunksCache.enabled: false`** (no memcached chunk cache). **`fluent/fluent-bit` 0.56.0** → **`loki-gateway.loki.svc:80`** (`clusters/noble/bootstrap/fluent-bit/`); **`logging`** PSA **privileged**. **Grafana Explore:** **`kubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** then **Explore → Loki** (e.g. `{job="fluent-bit"}`).
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1** — `clusters/noble/bootstrap/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
+- **SOPS** — cluster **`Secret`** manifests under **`clusters/noble/secrets/`** encrypted with **age** (see **`.sops.yaml`**, **`age-key.txt`** gitignored); **`noble.yml`** decrypt-applies when the private key is present.
 - **External Secrets Operator** Helm **2.2.0** / app **v2.2.0** — `clusters/noble/bootstrap/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
 - **Vault** Helm **0.32.0** / app **1.21.2** — `clusters/noble/bootstrap/vault/` — standalone **file** storage, **Longhorn** PVC; **HTTP** listener (`global.tlsDisable`); optional **CronJob** lab unseal **`unseal-cronjob.yaml`**; **not** initialized in git — run **`vault operator init`** per **`README.md`**.
 - **Velero** Helm **12.0.0** / app **v1.18.0** — `clusters/noble/bootstrap/velero/` (**Ansible** **`noble_velero`**, not Argo); **S3-compatible** backup location + **CSI** snapshots (**`EnableCSI`**); enable with **`noble_velero_install`** per **`velero/README.md`**.
 - **Still open:** **Renovate** — install **[Mend Renovate](https://github.com/apps/renovate)** (or self-host) so PRs run; optional **Alertmanager** notification channels; optional **sample Ingress + cert + Pangolin** end-to-end; **Argo CD SSO**.
@@ -64,9 +62,6 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 - kube-prometheus-stack: **82.15.1** (Helm chart `prometheus-community/kube-prometheus-stack`; app **v0.89.x** bundle)
 - Loki: **6.55.0** (Helm chart `grafana/loki`; app **3.6.7**)
 - Fluent Bit: **0.56.0** (Helm chart `fluent/fluent-bit`; app **4.2.3**)
 - Sealed Secrets: **2.18.4** (Helm chart `sealed-secrets/sealed-secrets`; app **0.36.1**)
 - External Secrets Operator: **2.2.0** (Helm chart `external-secrets/external-secrets`; app **v2.2.0**)
 - Vault: **0.32.0** (Helm chart `hashicorp/vault`; app **1.21.2**)
 - Kyverno: **3.7.1** (Helm chart `kyverno/kyverno`; app **v1.17.1**); **kyverno-policies** **3.7.1** — **baseline** PSS, **Audit** (`clusters/noble/bootstrap/kyverno/`)
 - Headlamp: **0.40.1** (Helm chart `headlamp/headlamp`; app matches chart — see [Artifact Hub](https://artifacthub.io/packages/helm/headlamp/headlamp))
 - Velero: **12.0.0** (Helm chart `vmware-tanzu/velero`; app **v1.18.0**) — **`clusters/noble/bootstrap/velero/`**; AWS plugin **v1.14.0**; Ansible **`noble_velero`**
@@ -77,7 +72,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 | Artifact | Path |
 |----------|------|
 | This checklist | `talos/CLUSTER-BUILD.md` |
-| Operational runbooks (API VIP, etcd, Longhorn, Vault) | `talos/runbooks/` |
+| Operational runbooks (API VIP, etcd, Longhorn, SOPS) | `talos/runbooks/` |
 | Talos quick start + networking + kubeconfig | `talos/README.md` |
 | talhelper source (active) | `talos/talconfig.yaml` — may be **wipe-phase** (no Longhorn volume) during disk recovery |
 | Longhorn volume restore | `talos/talconfig.with-longhorn.yaml` — copy to `talconfig.yaml` after GPT wipe (see `talos/README.md` §5) |
@@ -96,13 +91,11 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 | Grafana Loki datasource (ConfigMap; no chart change) | `clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml` |
 | Loki (Helm values) | `clusters/noble/bootstrap/loki/` — `values.yaml`, `namespace.yaml` |
 | Fluent Bit → Loki (Helm values) | `clusters/noble/bootstrap/fluent-bit/` — `values.yaml`, `namespace.yaml` |
-| Sealed Secrets (Helm) | `clusters/noble/bootstrap/sealed-secrets/` — `values.yaml`, `namespace.yaml`, `README.md` |
+| SOPS-encrypted cluster Secrets | `clusters/noble/secrets/` — `README.md`, `*.secret.yaml`; **`.sops.yaml`**, **`age-key.txt`** (gitignored) at repo root |
 | External Secrets Operator (Helm + Vault store example) | `clusters/noble/bootstrap/external-secrets/` — `values.yaml`, `namespace.yaml`, `README.md`, `examples/vault-cluster-secret-store.yaml` |
 | Vault (Helm + optional unseal CronJob) | `clusters/noble/bootstrap/vault/` — `values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `cilium-network-policy.yaml`, `configure-kubernetes-auth.sh`, `README.md` |
 | Kyverno + PSS baseline policies | `clusters/noble/bootstrap/kyverno/` — `values.yaml`, `policies-values.yaml`, `namespace.yaml`, `README.md` |
 | Headlamp (Helm + Ingress) | `clusters/noble/bootstrap/headlamp/` — `values.yaml`, `namespace.yaml`, `README.md` |
 | Velero (Helm + S3 BSL; CSI snapshots) | `clusters/noble/bootstrap/velero/` — `values.yaml`, `namespace.yaml`, `README.md`; **`ansible/roles/noble_velero`** |
-| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (**Sealed Secrets** / **ESO** after **Phase E**) |
+| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (SOPS under **`clusters/noble/secrets/`** or imperative **`kubectl create secret`**) |
 **Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.
@@ -114,10 +107,9 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 4. **CSI Volume snapshots:** **`kubernetes-csi/external-snapshotter`** CRDs + **`snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`) before relying on **Longhorn** / **Velero** volume snapshots.
 5. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
 6. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
-7. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
+7. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
-8. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
+8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, store the token with **SOPS** or an imperative Secret — no ingress required for the bot itself.
-9. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
+9. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
 10. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
 ## Prerequisites (before phases)
@@ -160,7 +152,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 - [x] **Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`) — also covered by **`ansible/playbooks/noble.yml`** (role **`noble_argocd`**)
 - [x] Argo CD server **LoadBalancer** — **`192.168.50.210`** (see `values.yaml`)
 - [x] **App-of-apps** — optional; **`clusters/noble/apps/kustomization.yaml`** is **empty** (core stack is **Ansible**-managed from **`clusters/noble/bootstrap/`**, not Argo). Set **`repoURL`** in **`root-application.yaml`** and add **`Application`** manifests only for optional GitOps workloads — see **`clusters/noble/apps/README.md`**
- [x] **Renovate** — **`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **Sealed Secrets** / **ESO**. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed.
+- [x] **Renovate** — **`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **SOPS** or a one-off Secret. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed.
 - [ ] SSO — later
 ## Phase D — Observability
@@ -171,9 +163,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 ## Phase E — Secrets
- [x] **Sealed Secrets** (optional Git workflow) — `clusters/noble/bootstrap/sealed-secrets/` (Helm **2.18.4**); **`kubeseal`** + key backup per **`README.md`**
+- [x] **SOPS** — encrypt **`Secret`** YAML under **`clusters/noble/secrets/`** with **age** (see **`.sops.yaml`**, **`clusters/noble/secrets/README.md`**); keep **`age-key.txt`** private (gitignored). **`ansible/playbooks/noble.yml`** decrypt-applies **`*.yaml`** when **`age-key.txt`** exists.
 - [x] **Vault** in-cluster on Longhorn + **auto-unseal** — `clusters/noble/bootstrap/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); **`configure-kubernetes-auth.sh`** for ESO (**Kubernetes auth** + KV + role)
 - [x] **External Secrets Operator** + Vault `ClusterSecretStore` — operator **`clusters/noble/bootstrap/external-secrets/`** (Helm **2.2.0**); apply **`examples/vault-cluster-secret-store.yaml`** after Vault (**`README.md`**)
 ## Phase F — Policy + backups
@@ -182,8 +172,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 ## Phase G — Hardening
- [x] **Cilium** — Vault **`CiliumNetworkPolicy`** (`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`) — HTTP **8200** from **`external-secrets`** + **`vault`**; extend for other clients as needed
+- [x] **Runbooks** — **`talos/runbooks/`** (API VIP / kube-vip, etcd–Talos, Longhorn, SOPS)
 - [x] **Runbooks** — **`talos/runbooks/`** (API VIP / kube-vip, etcd–Talos, Longhorn, Vault)
 - [x] **RBAC** — **Headlamp** **`ClusterRoleBinding`** uses built-in **`edit`** (not **`cluster-admin`**); **Argo CD** **`policy.default: role:readonly`** with **`g, admin, role:admin`** — see **`clusters/noble/bootstrap/headlamp/values.yaml`**, **`clusters/noble/bootstrap/argocd/values.yaml`**, **`talos/runbooks/rbac.md`**
 - [ ] **Alertmanager** — add **`slack_configs`**, **`pagerduty_configs`**, or other receivers under **`kube-prometheus-stack`** `alertmanager.config` (chart defaults use **`null`** receiver)
@@ -201,12 +190,10 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
 - [x] **`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**)
 - [x] **Grafana** — **Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync)
 - [x] **Headlamp** — Deployment **Running** in **`headlamp`**; UI at **`https://headlamp.apps.noble.lab.pcenicni.dev`** (TLS via **`letsencrypt-prod`**)
- [x] **`sealed-secrets`** — controller **Deployment** **Running** in **`sealed-secrets`** (install + **`kubeseal`** per **`apps/sealed-secrets/README.md`**)
+- [x] **SOPS secrets** — **`clusters/noble/secrets/*.yaml`** encrypted in git; **`noble.yml`** applies decrypted manifests when **`age-key.txt`** is present
 - [x] **`external-secrets`** — controller + webhook + cert-controller **Running** in **`external-secrets`**; apply **`ClusterSecretStore`** after Vault **Kubernetes auth**
 - [x] **`vault`** — **StatefulSet** **Running**, **`data-vault-0`** PVC **Bound** on **longhorn**; **`vault operator init`** + unseal per **`apps/vault/README.md`**
 - [x] **`kyverno`** — admission / background / cleanup / reports controllers **Running** in **`kyverno`**; **ClusterPolicies** for **PSS baseline** **Ready** (**Audit**)
 - [ ] **`velero`** — when enabled: Deployment **Running** in **`velero`**; **`BackupStorageLocation`** / **`VolumeSnapshotLocation`** **Available**; test backup per **`velero/README.md`**
- [x] **Phase G (partial)** — Vault **`CiliumNetworkPolicy`**; **`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
+- [x] **Phase G (partial)** — **`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
 ---
--- a/talos/README.md
+++ b/talos/README.md
@@ -1,7 +1,7 @@
 # Talos — noble lab
 - **Cluster build checklist (exported TODO):** [CLUSTER-BUILD.md](./CLUSTER-BUILD.md)
- **Operational runbooks (API VIP, etcd, Longhorn, Vault):** [runbooks/README.md](./runbooks/README.md)
+- **Operational runbooks (API VIP, etcd, Longhorn, SOPS):** [runbooks/README.md](./runbooks/README.md)
 ## Versions
--- a/talos/runbooks/README.md
+++ b/talos/runbooks/README.md
@@ -7,5 +7,5 @@ Short recovery / triage notes for the **noble** Talos cluster. Deep procedures l
 | Kubernetes API VIP (kube-vip) | [`api-vip-kube-vip.md`](./api-vip-kube-vip.md) |
 | etcd / Talos control plane | [`etcd-talos.md`](./etcd-talos.md) |
 | Longhorn storage | [`longhorn.md`](./longhorn.md) |
-| Vault (unseal, auth, ESO) | [`vault.md`](./vault.md) |
+| SOPS (secrets in git) | [`sops.md`](./sops.md) |
 | RBAC (Headlamp, Argo CD) | [`rbac.md`](./rbac.md) |
--- a/talos/runbooks/sops.md
+++ b/talos/runbooks/sops.md
@@ -0,0 +1,13 @@
 # Runbook: SOPS secrets (git-encrypted)
 **Symptoms:** `sops -d` fails; `kubectl apply` after Ansible shows no secret; `noble.yml` skips apply.
 **Checklist**
 1. **Private key:** `age-key.txt` at the repository root (gitignored). Create with `age-keygen -o age-key.txt` and add the **public** key to `.sops.yaml` (see `clusters/noble/secrets/README.md`).
 2. **Environment:** `export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt` when editing or applying by hand.
 3. **Edit encrypted file:** `sops clusters/noble/secrets/<name>.secret.yaml`
 4. **Apply one file:** `sops -d clusters/noble/secrets/<name>.secret.yaml | kubectl apply -f -`
 5. **Ansible:** `noble_apply_sops_secrets` is true by default; the platform role applies all `*.yaml` when `age-key.txt` exists.
 **References:** [`clusters/noble/secrets/README.md`](../../clusters/noble/secrets/README.md), [Mozilla SOPS](https://github.com/getsops/sops).
--- a/talos/runbooks/vault.md
+++ b/talos/runbooks/vault.md
@@ -1,15 +0,0 @@
 # Runbook: Vault (in-cluster)
 **Symptoms:** External Secrets **not syncing**, `ClusterSecretStore` **InvalidProviderConfig**, Vault UI/API **503 sealed**, pods **CrashLoop** on auth.
 **Checks**
 1. `kubectl -n vault exec -i sts/vault -- vault status` — **Sealed** / **Initialized**.
 2. Unseal key Secret + optional CronJob: [`clusters/noble/bootstrap/vault/README.md`](../../clusters/noble/bootstrap/vault/README.md), `unseal-cronjob.yaml`.
 3. Kubernetes auth for ESO: [`clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`](../../clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh) and `kubectl describe clustersecretstore vault`.
 4. **Cilium** policy: if Vault is unreachable from `external-secrets`, check [`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`](../../clusters/noble/bootstrap/vault/cilium-network-policy.yaml) and extend `ingress` for new client namespaces.
 **Common fixes**
 - Sealed: `vault operator unseal` or fix auto-unseal CronJob + `vault-unseal-key` Secret.
 - **403/invalid role** on ESO: re-run Kubernetes auth setup (issuer/CA/reviewer JWT) per README.