Update Ansible configuration to integrate SOPS for managing secrets. Enhance README.md with SOPS usage instructions and prerequisites. Remove External Secrets Operator references and related configurations from the bootstrap process, streamlining the deployment. Adjust playbooks and roles to apply SOPS-encrypted secrets automatically, improving security and clarity in secret management.
@@ -4,7 +4,7 @@ This document is the **exported TODO** for the **noble** Talos cluster (4 nodes)
## Current state (2026-03-28)
Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vault **CiliumNetworkPolicy**, **`talos/runbooks/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** backup/restore drill after S3 credentials are set (**`noble_velero_install`**).
Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (**`talos/runbooks/`**, **SOPS**-encrypted secrets in **`clusters/noble/secrets/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** backup/restore drill after S3 credentials are set (**`noble_velero_install`**).
- **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):**`factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
@@ -15,13 +15,11 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
- **Longhorn** Helm **1.11.1** / app **v1.11.1** — `clusters/noble/bootstrap/longhorn/` (PSA **privileged** namespace, `defaultDataPath``/var/mnt/longhorn`, `preUpgradeChecker` enabled); **StorageClass**`longhorn` (default); **`nodes.longhorn.io`** all **Ready**; test **PVC**`Bound` on `longhorn`.
- **Traefik** Helm **39.0.6** / app **v3.6.11** — `clusters/noble/bootstrap/traefik/`; **`Service`** **`LoadBalancer`** **`EXTERNAL-IP``192.168.50.211`**; **`IngressClass`** **`traefik`** (default). Point **`*.apps.noble.lab.pcenicni.dev`** at **`192.168.50.211`**. MetalLB pool verification was done before replacing the temporary nginx test with Traefik.
- **cert-manager** Helm **v1.20.0** / app **v1.20.0** — `clusters/noble/bootstrap/cert-manager/`; **`ClusterIssuer`** **`letsencrypt-staging`** and **`letsencrypt-prod`** (**DNS-01** via **Cloudflare** for **`pcenicni.dev`**, Secret **`cloudflare-dns-api-token`** in **`cert-manager`**); ACME email **`certificates@noble.lab.pcenicni.dev`** (edit in manifests if you want a different mailbox).
- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Prefer a **SealedSecret** in git (`kubeseal` — see `clusters/noble/bootstrap/sealed-secrets/examples/`) after rotating credentials if they were exposed. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Store credentials in git with **SOPS** (`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`, **`age-key.txt`**, **`.sops.yaml`**) — see **`clusters/noble/secrets/README.md`**. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1** — `clusters/noble/bootstrap/sealed-secrets/` (namespace **`sealed-secrets`**);**`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0** — `clusters/noble/bootstrap/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
- **SOPS** — cluster **`Secret`** manifests under **`clusters/noble/secrets/`** encrypted with **age** (see **`.sops.yaml`**,**`age-key.txt`** gitignored); **`noble.yml`** decrypt-applies when the private key is present.
- **Velero** Helm **12.0.0** / app **v1.18.0** — `clusters/noble/bootstrap/velero/` (**Ansible** **`noble_velero`**, not Argo); **S3-compatible** backup location + **CSI** snapshots (**`EnableCSI`**); enable with **`noble_velero_install`** per **`velero/README.md`**.
| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (**Sealed Secrets** / **ESO** after **Phase E**) |
| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (SOPS under **`clusters/noble/secrets/`** or imperative **`kubectl create secret`**) |
**Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.
@@ -114,10 +107,9 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
4.**CSI Volume snapshots:****`kubernetes-csi/external-snapshotter`** CRDs + **`snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`) before relying on **Longhorn** / **Velero** volume snapshots.
5.**Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
6.**Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
7.**Vault:****Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets****`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
8.**Headlamp:****Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
9.**Renovate:****Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after****Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
10.**Velero:****S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
7.**Headlamp:****Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
8.**Renovate:****Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, store the token with **SOPS** or an imperative Secret — no ingress required for the bot itself.
9.**Velero:****S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
## Prerequisites (before phases)
@@ -160,7 +152,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
- [x]**Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`) — also covered by **`ansible/playbooks/noble.yml`** (role **`noble_argocd`**)
- [x] Argo CD server **LoadBalancer** — **`192.168.50.210`** (see `values.yaml`)
- [x]**App-of-apps** — optional; **`clusters/noble/apps/kustomization.yaml`** is **empty** (core stack is **Ansible**-managed from **`clusters/noble/bootstrap/`**, not Argo). Set **`repoURL`** in **`root-application.yaml`** and add **`Application`** manifests only for optional GitOps workloads — see **`clusters/noble/apps/README.md`**
- [x]**Renovate** — **`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **Sealed Secrets** / **ESO**. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex**`customManagers` — extend **`renovate.json`** as needed.
- [x]**Renovate** — **`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **SOPS** or a one-off Secret. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex**`customManagers` — extend **`renovate.json`** as needed.
- [ ] SSO — later
## Phase D — Observability
@@ -171,9 +163,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
- [x]**SOPS** — encrypt **`Secret`** YAML under **`clusters/noble/secrets/`** with **age** (see **`.sops.yaml`**, **`clusters/noble/secrets/README.md`**); keep **`age-key.txt`** private (gitignored). **`ansible/playbooks/noble.yml`** decrypt-applies **`*.yaml`** when **`age-key.txt`** exists.
## Phase F — Policy + backups
@@ -182,8 +172,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
## Phase G — Hardening
- [x]**Cilium** — Vault**`CiliumNetworkPolicy`** (`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`) — HTTP **8200** from **`external-secrets`** + **`vault`**; extend for other clients as needed
- [ ]**Alertmanager** — add **`slack_configs`**, **`pagerduty_configs`**, or other receivers under **`kube-prometheus-stack`** `alertmanager.config` (chart defaults use **`null`** receiver)
@@ -201,12 +190,10 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
- [x]**`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**)
- [x]**Grafana** — **Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync)
- [x]**Headlamp** — Deployment **Running** in **`headlamp`**; UI at **`https://headlamp.apps.noble.lab.pcenicni.dev`** (TLS via **`letsencrypt-prod`**)
- [x]**`sealed-secrets`** — controller **Deployment****Running** in **`sealed-secrets`** (install + **`kubeseal`** per **`apps/sealed-secrets/README.md`**)
- [x]**`external-secrets`** — controller + webhook + cert-controller **Running** in **`external-secrets`**; apply **`ClusterSecretStore`** after Vault **Kubernetes auth**
- [x]**`vault`** — **StatefulSet****Running**, **`data-vault-0`** PVC **Bound** on **longhorn**; **`vault operator init`** + unseal per **`apps/vault/README.md`**
- [x]**SOPS secrets** — **`clusters/noble/secrets/*.yaml`** encrypted in git; **`noble.yml`** applies decrypted manifests when **`age-key.txt`** is present
- [x]**`kyverno`** — admission / background / cleanup / reports controllers **Running** in **`kyverno`**; **ClusterPolicies** for **PSS baseline****Ready** (**Audit**)
- [ ]**`velero`** — when enabled: Deployment **Running** in **`velero`**; **`BackupStorageLocation`** / **`VolumeSnapshotLocation`** **Available**; test backup per **`velero/README.md`**
- [x]**Phase G (partial)** — Vault **`CiliumNetworkPolicy`**;**`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
- [x]**Phase G (partial)** — **`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
**Symptoms:**`sops -d` fails; `kubectl apply` after Ansible shows no secret; `noble.yml` skips apply.
**Checklist**
1.**Private key:**`age-key.txt` at the repository root (gitignored). Create with `age-keygen -o age-key.txt` and add the **public** key to `.sops.yaml` (see `clusters/noble/secrets/README.md`).
2.**Environment:**`export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt` when editing or applying by hand.
3. Kubernetes auth for ESO: [`clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`](../../clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh) and `kubectl describe clustersecretstore vault`.
4.**Cilium** policy: if Vault is unreachable from `external-secrets`, check [`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`](../../clusters/noble/bootstrap/vault/cilium-network-policy.yaml) and extend `ingress` for new client namespaces.
- **403/invalid role** on ESO: re-run Kubernetes auth setup (issuer/CA/reviewer JWT) per README.
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.