Update README.md and CLUSTER-BUILD.md to enhance documentation for Vault Kubernetes auth and ClusterSecretStore integration. Add one-shot configuration instructions for Kubernetes auth in README.md, and update CLUSTER-BUILD.md to reflect the current state of the Talos cluster, including new components like Headlamp and Renovate, along with their deployment details and next steps.

This commit is contained in:
Nikholas Pcenicni
2026-03-28 01:41:52 -04:00
parent a65b553252
commit d5f38bd766
11 changed files with 454 additions and 5 deletions

View File

@@ -4,7 +4,7 @@ This document is the **exported TODO** for the **noble** Talos cluster (4 nodes)
## Current state (2026-03-28)
Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability) and **Phase E** (Sealed Secrets, External Secrets, **Vault** Helm install), with manifests matching this repo. **Next focus:** **Vault** `operator init` / unseal, optional **`unseal-cronjob.yaml`**, Kubernetes auth + **`ClusterSecretStore`**, optional Pangolin/sample Ingress validation, Velero when S3 exists.
Lab stack is **up** on-cluster through **Phase D** (observability), **Phase E** (Sealed Secrets, External Secrets, **Vault** + **`ClusterSecretStore`**), and **Phase F** (**Kyverno** **baseline** PSS **Audit**), with manifests matching this repo. **Next focus:** optional **Headlamp** (Ingress + TLS), **Renovate** (dependency PRs for Helm/manifests), Pangolin/sample Ingress validation, **Phase G**, **Velero** when S3 exists.
- **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
- **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/apps/cilium/`, phase 1 values).
@@ -21,7 +21,7 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1**`clusters/noble/apps/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0**`clusters/noble/apps/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
- **Vault** Helm **0.32.0** / app **1.21.2**`clusters/noble/apps/vault/` — standalone **file** storage, **Longhorn** PVC; **HTTP** listener (`global.tlsDisable`); optional **CronJob** lab unseal **`unseal-cronjob.yaml`**; **not** initialized in git — run **`vault operator init`** per **`README.md`**.
- **Still open:** Vault **Kubernetes auth** + **`ClusterSecretStore`** apply + KV for ESO; **Phase FG**; optional **sample Ingress + cert + Pangolin** end-to-end; **Velero** when S3 is ready; **Argo CD SSO**.
- **Still open:** **Headlamp** (Helm + Traefik Ingress + **`letsencrypt-prod`**); **Renovate** ([Renovate](https://docs.renovatebot.com/) — dependency bot; hosted app **or** self-hosted on-cluster); **Phase G**; optional **sample Ingress + cert + Pangolin** end-to-end; **Velero** when S3 is ready; **Argo CD SSO**.
## Inventory
@@ -42,6 +42,7 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
| Traefik (apps ingress) | `192.168.50.211`**`metallb.io/loadBalancerIPs`** in `clusters/noble/apps/traefik/values.yaml` |
| Apps ingress (LAN / split horizon) | `*.apps.noble.lab.pcenicni.dev` → Traefik LB |
| Grafana (Ingress + TLS) | **`grafana.apps.noble.lab.pcenicni.dev`** — `grafana.ingress` in `clusters/noble/apps/kube-prometheus-stack/values.yaml` (**`letsencrypt-prod`**) |
| Headlamp (Ingress + TLS) | **`headlamp.apps.noble.lab.pcenicni.dev`** — chart `ingress` in `clusters/noble/apps/headlamp/` (**`letsencrypt-prod`**, **`ingressClassName: traefik`**) |
| Public DNS (Pangolin) | **Newt** tunnel + **CNAME** at registrar + **Integration API**`clusters/noble/apps/newt/` |
| Velero | S3-compatible URL — configure later |
@@ -64,6 +65,9 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
- Sealed Secrets: **2.18.4** (Helm chart `sealed-secrets/sealed-secrets`; app **0.36.1**)
- External Secrets Operator: **2.2.0** (Helm chart `external-secrets/external-secrets`; app **v2.2.0**)
- Vault: **0.32.0** (Helm chart `hashicorp/vault`; app **1.21.2**)
- Kyverno: **3.7.1** (Helm chart `kyverno/kyverno`; app **v1.17.1**); **kyverno-policies** **3.7.1****baseline** PSS, **Audit** (`clusters/noble/apps/kyverno/`)
- Headlamp: **0.40.1** (Helm chart `headlamp/headlamp`; app matches chart — see [Artifact Hub](https://artifacthub.io/packages/helm/headlamp/headlamp))
- Renovate: **hosted** (Mend **Renovate** GitHub/GitLab app — no cluster chart) **or** **self-hosted** — pin chart when added ([Helm charts](https://docs.renovatebot.com/helm-charts/), OCI `ghcr.io/renovatebot/charts/renovate`); pair **`renovate.json`** with this repos Helm paths under **`clusters/noble/`**
## Repo paths (this workspace)
@@ -89,7 +93,10 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
| Fluent Bit → Loki (Helm values) | `clusters/noble/apps/fluent-bit/``values.yaml`, `namespace.yaml` |
| Sealed Secrets (Helm) | `clusters/noble/apps/sealed-secrets/``values.yaml`, `namespace.yaml`, `README.md` |
| External Secrets Operator (Helm + Vault store example) | `clusters/noble/apps/external-secrets/``values.yaml`, `namespace.yaml`, `README.md`, `examples/vault-cluster-secret-store.yaml` |
| Vault (Helm + optional unseal CronJob) | `clusters/noble/apps/vault/``values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `README.md` |
| Vault (Helm + optional unseal CronJob) | `clusters/noble/apps/vault/``values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `configure-kubernetes-auth.sh`, `README.md` |
| Kyverno + PSS baseline policies | `clusters/noble/apps/kyverno/``values.yaml`, `policies-values.yaml`, `namespace.yaml`, `README.md` |
| Headlamp (Helm + Ingress) | `clusters/noble/apps/headlamp/``values.yaml`, `namespace.yaml` (planned — `helm repo add headlamp https://kubernetes-sigs.github.io/headlamp/`) |
| Renovate (repo config + optional self-hosted Helm) | `renovate.json` or `renovate.json5` at repo root (see [Renovate docs](https://docs.renovatebot.com/)); optional `clusters/noble/apps/renovate/` for self-hosted chart + token Secret (**Sealed Secrets** / **ESO** after **Phase E**) |
**Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.
@@ -101,6 +108,8 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
4. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/apps/longhorn/values.yaml`.
5. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
6. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/apps/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
7. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
## Prerequisites (before phases)
@@ -141,22 +150,24 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
- [x] **Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`)
- [x] Argo CD server **LoadBalancer****`192.168.50.210`** (see `values.yaml`)
- [X] **App-of-apps** — set **`repoURL`** in **`root-application.yaml`**, add **`Application`** manifests under **`bootstrap/argocd/apps/`**, apply **`root-application.yaml`**
- [ ] **Renovate** — [Renovate](https://docs.renovatebot.com/) opens PRs for Helm charts, Docker tags, and related bumps. **Option A:** install the **Mend Renovate** app on **GitHub** / **GitLab** for this repo (no cluster). **Option B:** self-hosted — **`helm repo add renovate https://docs.renovatebot.com/helm-charts`** or OCI per [Helm charts](https://docs.renovatebot.com/helm-charts/); **`renovate.config`** with token from **Sealed Secrets** / **ESO** (**`clusters/noble/apps/renovate/`** when added). Add **`renovate.json`** (or **`renovate.json5`**) at repo root with **`packageRules`**, **`kubernetes`** / **`helm-values`** file patterns covering **`clusters/noble/`** (Helm **`values.yaml`**, manifests). Verify a dry run or first dependency PR.
- [ ] SSO — later
## Phase D — Observability
- [x] **kube-prometheus-stack**`kubectl apply -f clusters/noble/apps/kube-prometheus-stack/namespace.yaml` then **`helm upgrade --install`** as in `clusters/noble/apps/kube-prometheus-stack/values.yaml` (chart **82.15.1**); PVCs **`longhorn`**; **`--wait --timeout 30m`** recommended; verify **`kubectl -n monitoring get pods,pvc`**
- [x] **Loki** + **Fluent Bit** + **Grafana Loki datasource****order:** **`kubectl apply -f clusters/noble/apps/loki/namespace.yaml`** → **`helm upgrade --install loki`** `grafana/loki` **6.55.0** `-f clusters/noble/apps/loki/values.yaml`**`kubectl apply -f clusters/noble/apps/fluent-bit/namespace.yaml`** → **`helm upgrade --install fluent-bit`** `fluent/fluent-bit` **0.56.0** `-f clusters/noble/apps/fluent-bit/values.yaml`**`kubectl apply -f clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml`**. Verify **Explore → Loki** in Grafana; **`kubectl -n loki get pods,pvc`**, **`kubectl -n logging get pods`**
- [ ] **Headlamp** — Kubernetes web UI ([Headlamp](https://headlamp.dev/)); **`helm repo add headlamp https://kubernetes-sigs.github.io/headlamp/`**; **`kubectl apply -f clusters/noble/apps/headlamp/namespace.yaml`** → **`helm upgrade --install headlamp headlamp/headlamp --version 0.40.1 -n headlamp -f clusters/noble/apps/headlamp/values.yaml`**; **Ingress** **`https://headlamp.apps.noble.lab.pcenicni.dev`** (**`ingressClassName: traefik`**, **`cert-manager.io/cluster-issuer: letsencrypt-prod`**). **RBAC:** chart defaults are permissive — tighten before LAN-wide exposure; align with **Phase G** hardening.
## Phase E — Secrets
- [x] **Sealed Secrets** (optional Git workflow) — `clusters/noble/apps/sealed-secrets/` (Helm **2.18.4**); **`kubeseal`** + key backup per **`README.md`**
- [x] **Vault** in-cluster on Longhorn + **auto-unseal**`clusters/noble/apps/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); init/unseal/Kubernetes auth for ESO still **to do** on cluster
- [x] **Vault** in-cluster on Longhorn + **auto-unseal**`clusters/noble/apps/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); **`configure-kubernetes-auth.sh`** for ESO (**Kubernetes auth** + KV + role)
- [x] **External Secrets Operator** + Vault `ClusterSecretStore` — operator **`clusters/noble/apps/external-secrets/`** (Helm **2.2.0**); apply **`examples/vault-cluster-secret-store.yaml`** after Vault (**`README.md`**)
## Phase F — Policy + backups
- [ ] **Kyverno** baseline policies
- [x] **Kyverno** baseline policies`clusters/noble/apps/kyverno/` (Helm **kyverno** **3.7.1** + **kyverno-policies** **3.7.1**, **baseline** / **Audit** — see **`README.md`**)
- [ ] **Velero** when S3 is ready; backup/restore drill
## Phase G — Hardening
@@ -170,15 +181,18 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
- [x] API via VIP `:6443`**`kubectl get --raw /healthz`** → **`ok`** with kubeconfig **`server:`** `https://192.168.50.230:6443`
- [x] Ingress **`LoadBalancer`** in pool `210``229` (**Traefik** → **`192.168.50.211`**)
- [x] **Argo CD** UI — **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`** (initial **`admin`** password from **`argocd-initial-admin-secret`**)
- [ ] **Renovate** — hosted app enabled for this repo **or** self-hosted workload **Running** + PRs updating **`clusters/noble/`** manifests as configured
- [ ] Sample Ingress + cert (cert-manager ready) + Pangolin resource + CNAME
- [x] PVC **`Bound`** on **Longhorn** (`storageClassName: longhorn`); Prometheus/Loki durable when configured
- [x] **`monitoring`** — **kube-prometheus-stack** core workloads **Running** (Prometheus, Grafana, Alertmanager, operator, kube-state-metrics, node-exporter); PVCs **Bound** on **longhorn**
- [x] **`loki`** — **Loki** SingleBinary + **gateway** **Running**; **`loki`** PVC **Bound** on **longhorn** (no chunks-cache by design)
- [x] **`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**)
- [x] **Grafana****Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync)
- [ ] **Headlamp** — Deployment **Running** in **`headlamp`**; UI at **`https://headlamp.apps.noble.lab.pcenicni.dev`** (TLS via **`letsencrypt-prod`**)
- [x] **`sealed-secrets`** — controller **Deployment** **Running** in **`sealed-secrets`** (install + **`kubeseal`** per **`apps/sealed-secrets/README.md`**)
- [x] **`external-secrets`** — controller + webhook + cert-controller **Running** in **`external-secrets`**; apply **`ClusterSecretStore`** after Vault **Kubernetes auth**
- [x] **`vault`** — **StatefulSet** **Running**, **`data-vault-0`** PVC **Bound** on **longhorn**; **`vault operator init`** + unseal per **`apps/vault/README.md`**
- [x] **`kyverno`** — admission / background / cleanup / reports controllers **Running** in **`kyverno`**; **ClusterPolicies** for **PSS baseline** **Ready** (**Audit**)
---