Update CLUSTER-BUILD.md to reflect the current state of the Talos cluster, detailing progress through Phase D (observability) and advancements in Phase E (secrets). Include updates on Sealed Secrets, External Secrets Operator, and Vault configurations, along with deployment instructions and next steps for Kubernetes auth and ClusterSecretStore integration. Mark relevant tasks as completed and outline remaining objectives for future phases.
This commit is contained in:
@@ -4,7 +4,7 @@ This document is the **exported TODO** for the **noble** Talos cluster (4 nodes)
|
||||
|
||||
## Current state (2026-03-28)
|
||||
|
||||
Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability), with manifests matching this repo. **Next focus:** Phase E (secrets), optional Pangolin/sample Ingress validation, Velero when S3 exists.
|
||||
Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability) and **Phase E** (Sealed Secrets, External Secrets, **Vault** Helm install), with manifests matching this repo. **Next focus:** **Vault** `operator init` / unseal, optional **`unseal-cronjob.yaml`**, Kubernetes auth + **`ClusterSecretStore`**, optional Pangolin/sample Ingress validation, Velero when S3 exists.
|
||||
|
||||
- **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** for `kubectl` (root `kubeconfig` may still be a stub). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
|
||||
- **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/apps/cilium/`, phase 1 values).
|
||||
@@ -18,7 +18,10 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
|
||||
- **Argo CD** Helm **9.4.17** / app **v3.3.6** — `clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; app-of-apps scaffold under **`bootstrap/argocd/apps/`** (edit **`root-application.yaml`** `repoURL` before applying).
|
||||
- **kube-prometheus-stack** — Helm chart **82.15.1** — `clusters/noble/apps/kube-prometheus-stack/` (**namespace** `monitoring`, PSA **privileged** — **node-exporter** needs host mounts); **Longhorn** PVCs for Prometheus, Grafana, Alertmanager; **node-exporter** DaemonSet **4/4**. **Grafana Ingress:** **`https://grafana.apps.noble.lab.pcenicni.dev`** (Traefik **`ingressClassName: traefik`**, **`cert-manager.io/cluster-issuer: letsencrypt-prod`**). **Loki** datasource in Grafana: ConfigMap **`clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml`** (sidecar label **`grafana_datasource: "1"`**) — not via **`grafana.additionalDataSources`** in the chart. **`helm upgrade --install` with `--wait` is silent until done** — use **`--timeout 30m`**; Grafana admin: Secret **`kube-prometheus-grafana`**, keys **`admin-user`** / **`admin-password`**.
|
||||
- **Loki** + **Fluent Bit** — **`grafana/loki` 6.55.0** SingleBinary + **filesystem** on **Longhorn** (`clusters/noble/apps/loki/`); **`loki.auth_enabled: false`**; **`chunksCache.enabled: false`** (no memcached chunk cache). **`fluent/fluent-bit` 0.56.0** → **`loki-gateway.loki.svc:80`** (`clusters/noble/apps/fluent-bit/`); **`logging`** PSA **privileged**. **Grafana Explore:** **`kubectl apply -f clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml`** then **Explore → Loki** (e.g. `{job="fluent-bit"}`).
|
||||
- **Still open:** **Phase E** (Sealed Secrets / Vault / ESO); **Phase F–G**; optional **sample Ingress + cert + Pangolin** end-to-end; **Velero** when S3 is ready; **Argo CD SSO**.
|
||||
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1** — `clusters/noble/apps/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
|
||||
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0** — `clusters/noble/apps/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
|
||||
- **Vault** Helm **0.32.0** / app **1.21.2** — `clusters/noble/apps/vault/` — standalone **file** storage, **Longhorn** PVC; **HTTP** listener (`global.tlsDisable`); optional **CronJob** lab unseal **`unseal-cronjob.yaml`**; **not** initialized in git — run **`vault operator init`** per **`README.md`**.
|
||||
- **Still open:** Vault **Kubernetes auth** + **`ClusterSecretStore`** apply + KV for ESO; **Phase F–G**; optional **sample Ingress + cert + Pangolin** end-to-end; **Velero** when S3 is ready; **Argo CD SSO**.
|
||||
|
||||
## Inventory
|
||||
|
||||
@@ -58,6 +61,9 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
|
||||
- kube-prometheus-stack: **82.15.1** (Helm chart `prometheus-community/kube-prometheus-stack`; app **v0.89.x** bundle)
|
||||
- Loki: **6.55.0** (Helm chart `grafana/loki`; app **3.6.7**)
|
||||
- Fluent Bit: **0.56.0** (Helm chart `fluent/fluent-bit`; app **4.2.3**)
|
||||
- Sealed Secrets: **2.18.4** (Helm chart `sealed-secrets/sealed-secrets`; app **0.36.1**)
|
||||
- External Secrets Operator: **2.2.0** (Helm chart `external-secrets/external-secrets`; app **v2.2.0**)
|
||||
- Vault: **0.32.0** (Helm chart `hashicorp/vault`; app **1.21.2**)
|
||||
|
||||
## Repo paths (this workspace)
|
||||
|
||||
@@ -81,6 +87,9 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
|
||||
| Grafana Loki datasource (ConfigMap; no chart change) | `clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml` |
|
||||
| Loki (Helm values) | `clusters/noble/apps/loki/` — `values.yaml`, `namespace.yaml` |
|
||||
| Fluent Bit → Loki (Helm values) | `clusters/noble/apps/fluent-bit/` — `values.yaml`, `namespace.yaml` |
|
||||
| Sealed Secrets (Helm) | `clusters/noble/apps/sealed-secrets/` — `values.yaml`, `namespace.yaml`, `README.md` |
|
||||
| External Secrets Operator (Helm + Vault store example) | `clusters/noble/apps/external-secrets/` — `values.yaml`, `namespace.yaml`, `README.md`, `examples/vault-cluster-secret-store.yaml` |
|
||||
| Vault (Helm + optional unseal CronJob) | `clusters/noble/apps/vault/` — `values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `README.md` |
|
||||
|
||||
**Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.
|
||||
|
||||
@@ -91,6 +100,7 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
|
||||
3. **`clusters/noble/apps/metallb/namespace.yaml`** before or merged onto `metallb-system` so Pod Security does not block speaker (see `apps/metallb/README.md`).
|
||||
4. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/apps/longhorn/values.yaml`.
|
||||
5. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
|
||||
6. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/apps/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
|
||||
|
||||
## Prerequisites (before phases)
|
||||
|
||||
@@ -140,9 +150,9 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
|
||||
|
||||
## Phase E — Secrets
|
||||
|
||||
- [ ] **Sealed Secrets** (optional Git workflow)
|
||||
- [ ] **Vault** in-cluster on Longhorn + **auto-unseal**
|
||||
- [ ] **External Secrets Operator** + Vault `ClusterSecretStore`
|
||||
- [x] **Sealed Secrets** (optional Git workflow) — `clusters/noble/apps/sealed-secrets/` (Helm **2.18.4**); **`kubeseal`** + key backup per **`README.md`**
|
||||
- [x] **Vault** in-cluster on Longhorn + **auto-unseal** — `clusters/noble/apps/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); init/unseal/Kubernetes auth for ESO still **to do** on cluster
|
||||
- [x] **External Secrets Operator** + Vault `ClusterSecretStore` — operator **`clusters/noble/apps/external-secrets/`** (Helm **2.2.0**); apply **`examples/vault-cluster-secret-store.yaml`** after Vault (**`README.md`**)
|
||||
|
||||
## Phase F — Policy + backups
|
||||
|
||||
@@ -166,6 +176,9 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
|
||||
- [x] **`loki`** — **Loki** SingleBinary + **gateway** **Running**; **`loki`** PVC **Bound** on **longhorn** (no chunks-cache by design)
|
||||
- [x] **`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**)
|
||||
- [x] **Grafana** — **Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync)
|
||||
- [x] **`sealed-secrets`** — controller **Deployment** **Running** in **`sealed-secrets`** (install + **`kubeseal`** per **`apps/sealed-secrets/README.md`**)
|
||||
- [x] **`external-secrets`** — controller + webhook + cert-controller **Running** in **`external-secrets`**; apply **`ClusterSecretStore`** after Vault **Kubernetes auth**
|
||||
- [x] **`vault`** — **StatefulSet** **Running**, **`data-vault-0`** PVC **Bound** on **longhorn**; **`vault operator init`** + unseal per **`apps/vault/README.md`**
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user