Update Ansible configuration to integrate SOPS for managing secrets. Enhance README.md with SOPS usage instructions and prerequisites. Remove External Secrets Operator references and related configurations from the bootstrap process, streamlining the deployment. Adjust playbooks and roles to apply SOPS-encrypted secrets automatically, improving security and clarity in secret management.

This commit is contained in:
Nikholas Pcenicni
2026-03-30 22:42:52 -04:00
parent 023ebfee5d
commit 3a6e5dff5b
44 changed files with 644 additions and 809 deletions

View File

@@ -8,8 +8,8 @@ This document describes the **noble** Talos lab cluster: node topology, networki
|---------------|---------|
| **Subgraph “Cluster”** | Kubernetes cluster boundary (`noble`) |
| **External / DNS / cloud** | Services outside the data plane (internet, registrar, Pangolin) |
| **Data store** | Durable data (etcd, Longhorn, Loki, Vault storage) |
| **Secrets / policy** | Secret material, Vault, admission policy |
| **Data store** | Durable data (etcd, Longhorn, Loki) |
| **Secrets / policy** | Secret material (SOPS in git), admission policy |
| **LB / VIP** | Load balancer, MetalLB assignment, or API VIP |
---
@@ -74,7 +74,7 @@ flowchart TB
## Platform stack (bootstrap → workloads)
Order: **Talos****Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip****Traefik**, **cert-manager****Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `vault`, `external-secrets`, `sealed-secrets`, `kyverno`, `newt`, and others as deployed.
Order: **Talos****Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip****Traefik**, **cert-manager****Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `kyverno`, `newt`, and others as deployed.
```mermaid
flowchart TB
@@ -98,7 +98,7 @@ flowchart TB
Argo["Argo CD<br/>(optional app-of-apps; platform via Ansible)"]
end
subgraph L5["Platform namespaces (examples)"]
NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, vault, external-secrets, sealed-secrets,<br/>kyverno, newt, …"]
NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, kyverno, newt, …"]
end
Talos --> Cilium --> MS
Cilium --> LH
@@ -149,22 +149,20 @@ flowchart LR
## Secrets and policy
**Sealed Secrets** decrypts `SealedSecret` objects in-cluster. **External Secrets Operator** syncs from **Vault** using **`ClusterSecretStore`** (see [`examples/vault-cluster-secret-store.yaml`](../clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml)). Trust is **cluster → Vault** (ESO calls Vault; Vault does not initiate cluster trust). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
**Mozilla SOPS** with **age** encrypts plain Kubernetes **`Secret`** manifests under [`clusters/noble/secrets/`](../clusters/noble/secrets/); operators decrypt at apply time (`ansible/playbooks/noble.yml` or `sops -d … | kubectl apply`). The private key is **`age-key.txt`** at the repo root (gitignored). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
```mermaid
flowchart LR
subgraph Git["Git repo"]
SSman["SealedSecret manifests<br/>(optional)"]
SM["SOPS-encrypted Secret YAML<br/>clusters/noble/secrets/"]
end
subgraph ops["Apply path"]
SOPS["sops -d + kubectl apply<br/>(or Ansible noble.yml)"]
end
subgraph cluster["Cluster"]
SSC["Sealed Secrets controller<br/>sealed-secrets"]
ESO["External Secrets Operator<br/>external-secrets"]
V["Vault<br/>vault namespace<br/>HTTP listener"]
K["Kyverno + kyverno-policies<br/>PSS baseline Audit"]
end
SSman -->|"encrypted"| SSC -->|"decrypt to Secret"| workloads["Workload Secrets"]
ESO -->|"ClusterSecretStore →"| V
ESO -->|"sync ExternalSecret"| workloads
SM --> SOPS -->|"plain Secret"| workloads["Workload Secrets"]
K -.->|"admission / audit<br/>(PSS baseline)"| workloads
```
@@ -172,7 +170,7 @@ flowchart LR
## Data and storage
**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **Vault**, **kube-prometheus-stack** PVCs, and **Loki**.
**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **kube-prometheus-stack** PVCs and **Loki**.
```mermaid
flowchart TB
@@ -183,12 +181,10 @@ flowchart TB
SC["StorageClass: longhorn (default)"]
end
subgraph consumers["Stateful / durable consumers"]
V["Vault PVC data-vault-0"]
PGL["kube-prometheus-stack PVCs"]
L["Loki PVC"]
end
UD --> SC
SC --> V
SC --> PGL
SC --> L
```
@@ -210,7 +206,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
| Argo CD | 9.4.17 / app v3.3.6 |
| kube-prometheus-stack | 82.15.1 |
| Loki / Fluent Bit | 6.55.0 / 0.56.0 |
| Sealed Secrets / ESO / Vault | 2.18.4 / 2.2.0 / 0.32.0 |
| SOPS (client tooling) | see `clusters/noble/secrets/README.md` |
| Kyverno | 3.7.1 / policies 3.7.1 |
| Newt | 1.2.0 / app 1.10.1 |
@@ -218,7 +214,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
## Narrative
The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210``192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, Loki, and **Vault**. **Secrets** combine **Sealed Secrets** for git-encrypted material, **Vault** with **External Secrets** for dynamic sync, and **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210``192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, and Loki. **Secrets** in git use **SOPS** + **age** under **`clusters/noble/secrets/`**; **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
---