Update README.md and CLUSTER-BUILD.md to enhance documentation for Vault Kubernetes auth and ClusterSecretStore integration. Add one-shot configuration instructions for Kubernetes auth in README.md, and update CLUSTER-BUILD.md to reflect the current state of the Talos cluster, including new components like Headlamp and Renovate, along with their deployment details and next steps.

This commit is contained in:
Nikholas Pcenicni
2026-03-28 01:41:52 -04:00
parent a65b553252
commit d5f38bd766
11 changed files with 454 additions and 5 deletions

View File

@@ -0,0 +1,18 @@
# Headlamp (noble)
[Headlamp](https://headlamp.dev/) web UI for the cluster. Exposed on **`https://headlamp.apps.noble.lab.pcenicni.dev`** via **Traefik** + **cert-manager** (`letsencrypt-prod`), same pattern as Grafana.
- **Chart:** `headlamp/headlamp` **0.40.1**
- **Namespace:** `headlamp`
## Install
```bash
helm repo add headlamp https://kubernetes-sigs.github.io/headlamp/
helm repo update
kubectl apply -f clusters/noble/apps/headlamp/namespace.yaml
helm upgrade --install headlamp headlamp/headlamp -n headlamp \
--version 0.40.1 -f clusters/noble/apps/headlamp/values.yaml --wait --timeout 10m
```
Sign-in uses a **ServiceAccount token** (Headlamp docs: create a limited SA for day-to-day use). The charts default **ClusterRole** is powerful — tighten RBAC and/or add **OIDC** in **`values.yaml`** under **`config.oidc`** when hardening (**Phase G**).

View File

@@ -0,0 +1,10 @@
# Headlamp — apply before Helm.
# Chart pods do not satisfy PSA "restricted" (see install warnings); align with other UIs.
apiVersion: v1
kind: Namespace
metadata:
name: headlamp
labels:
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/warn: privileged

View File

@@ -0,0 +1,25 @@
# Headlamp — noble (Kubernetes web UI)
#
# helm repo add headlamp https://kubernetes-sigs.github.io/headlamp/
# helm repo update
# kubectl apply -f clusters/noble/apps/headlamp/namespace.yaml
# helm upgrade --install headlamp headlamp/headlamp -n headlamp \
# --version 0.40.1 -f clusters/noble/apps/headlamp/values.yaml --wait --timeout 10m
#
# DNS: headlamp.apps.noble.lab.pcenicni.dev → Traefik LB (see talos/CLUSTER-BUILD.md).
# Default chart RBAC is broad — restrict for production (Phase G).
ingress:
enabled: true
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: headlamp.apps.noble.lab.pcenicni.dev
paths:
- path: /
type: Prefix
tls:
- secretName: headlamp-apps-noble-tls
hosts:
- headlamp.apps.noble.lab.pcenicni.dev

View File

@@ -0,0 +1,31 @@
# Kyverno (noble)
Admission policies using [Kyverno](https://kyverno.io/). The main chart installs controllers and CRDs; **`kyverno-policies`** installs **Pod Security Standard** rules matching the **`baseline`** profile in **`Audit`** mode (violations are visible in policy reports; workloads are not denied).
- **Charts:** `kyverno/kyverno` **3.7.1** (app **v1.17.1**), `kyverno/kyverno-policies` **3.7.1**
- **Namespace:** `kyverno`
## Install
```bash
helm repo add kyverno https://kyverno.github.io/kyverno/
helm repo update
kubectl apply -f clusters/noble/apps/kyverno/namespace.yaml
helm upgrade --install kyverno kyverno/kyverno -n kyverno \
--version 3.7.1 -f clusters/noble/apps/kyverno/values.yaml --wait --timeout 15m
helm upgrade --install kyverno-policies kyverno/kyverno-policies -n kyverno \
--version 3.7.1 -f clusters/noble/apps/kyverno/policies-values.yaml --wait --timeout 10m
```
Verify:
```bash
kubectl -n kyverno get pods
kubectl get clusterpolicy | head
```
## Notes
- **`validationFailureAction: Audit`** in `policies-values.yaml` avoids breaking namespaces that need **privileged** behavior (Longhorn, monitoring node-exporter, etc.). Switch specific policies or namespaces to **`Enforce`** when you are ready.
- To use **`restricted`** instead of **`baseline`**, change **`podSecurityStandard`** in `policies-values.yaml` and reconcile expectations for host mounts and capabilities.
- Upgrade: bump **`--version`** on both charts together; read [Kyverno release notes](https://github.com/kyverno/kyverno/releases) for breaking changes.

View File

@@ -0,0 +1,5 @@
# Kyverno — apply before Helm.
apiVersion: v1
kind: Namespace
metadata:
name: kyverno

View File

@@ -0,0 +1,16 @@
# kyverno/kyverno-policies — Pod Security Standards as Kyverno ClusterPolicies
#
# helm upgrade --install kyverno-policies kyverno/kyverno-policies -n kyverno \
# --version 3.7.1 -f clusters/noble/apps/kyverno/policies-values.yaml --wait --timeout 10m
#
# Default profile is baseline; validationFailureAction is Audit so existing privileged
# workloads (monitoring, longhorn, etc.) are reported, not blocked. Tighten per policy or
# namespace when ready (see README).
#
policyKind: ClusterPolicy
policyType: ClusterPolicy
podSecurityStandard: baseline
podSecuritySeverity: medium
validationFailureAction: Audit
failurePolicy: Fail
validationAllowExistingViolations: true

View File

@@ -0,0 +1,10 @@
# Kyverno — noble (policy engine)
#
# helm repo add kyverno https://kyverno.github.io/kyverno/
# helm repo update
# kubectl apply -f clusters/noble/apps/kyverno/namespace.yaml
# helm upgrade --install kyverno kyverno/kyverno -n kyverno \
# --version 3.7.1 -f clusters/noble/apps/kyverno/values.yaml --wait --timeout 15m
#
# Baseline Pod Security policies (separate chart): see policies-values.yaml + README.md
#

View File

@@ -54,6 +54,8 @@ Vault **OSS** auto-unseal uses cloud KMS (AWS, GCP, Azure, OCI), **Transit** (an
## Kubernetes auth (External Secrets / ClusterSecretStore)
**One-shot:** from the repo root, `export KUBECONFIG=talos/kubeconfig` and `export VAULT_TOKEN=…`, then run **`./clusters/noble/apps/vault/configure-kubernetes-auth.sh`** (idempotent). Then **`kubectl apply -f clusters/noble/apps/external-secrets/examples/vault-cluster-secret-store.yaml`** on its own line (shell comments **`# …`** on the same line are parsed as extra `kubectl` args and break `apply`). **`kubectl get clustersecretstore vault`** should show **READY=True** after a few seconds.
Run these **from your workstation** (needs `kubectl`; no local `vault` binary required). Use a **short-lived admin token** or the root token **only in your shell** — do not paste tokens into logs or chat.
**1. Enable the auth method** (skip if already done):

View File

@@ -0,0 +1,77 @@
#!/usr/bin/env bash
# Configure Vault Kubernetes auth + KV v2 + policy/role for External Secrets Operator.
# Requires: kubectl (cluster access), jq optional (openid issuer); Vault reachable via sts/vault.
#
# Usage (from repo root):
# export KUBECONFIG=talos/kubeconfig # or your path
# export VAULT_TOKEN='…' # root or admin token — never commit
# ./clusters/noble/apps/vault/configure-kubernetes-auth.sh
#
# Then: kubectl apply -f clusters/noble/apps/external-secrets/examples/vault-cluster-secret-store.yaml
# Verify: kubectl describe clustersecretstore vault
set -euo pipefail
: "${VAULT_TOKEN:?Set VAULT_TOKEN to your Vault root or admin token}"
ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
sh -ec '
set -e
vault auth list >/tmp/vauth.txt
grep -q "^kubernetes/" /tmp/vauth.txt || vault auth enable kubernetes
'
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
CA_B64="$CA_B64" \
REVIEWER="$REVIEWER" \
ISSUER="$ISSUER" \
sh -ec '
echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
vault write auth/kubernetes/config \
kubernetes_host="https://kubernetes.default.svc:443" \
kubernetes_ca_cert=@/tmp/k8s-ca.crt \
token_reviewer_jwt="$REVIEWER" \
issuer="$ISSUER"
'
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
sh -ec '
set -e
vault secrets list >/tmp/vsec.txt
grep -q "^secret/" /tmp/vsec.txt || vault secrets enable -path=secret kv-v2
'
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
sh -ec '
vault policy write external-secrets - <<EOF
path "secret/data/*" {
capabilities = ["read", "list"]
}
path "secret/metadata/*" {
capabilities = ["read", "list"]
}
EOF
vault write auth/kubernetes/role/external-secrets \
bound_service_account_names=external-secrets \
bound_service_account_namespaces=external-secrets \
policies=external-secrets \
ttl=24h
'
echo "Done. Issuer used: $ISSUER"
echo ""
echo "Next (each command on its own line — do not paste # comments after kubectl):"
echo " kubectl apply -f clusters/noble/apps/external-secrets/examples/vault-cluster-secret-store.yaml"
echo " kubectl get clustersecretstore vault"

241
docs/architecture.md Normal file
View File

@@ -0,0 +1,241 @@
# Noble platform architecture
This document describes the **noble** Talos lab cluster: node topology, networking, platform stack, observability, secrets/policy, and storage. Facts align with [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md), [`talos/talconfig.yaml`](../talos/talconfig.yaml), and manifests under [`clusters/noble/`](../clusters/noble/).
## Legend
| Shape / style | Meaning |
|---------------|---------|
| **Subgraph “Cluster”** | Kubernetes cluster boundary (`noble`) |
| **External / DNS / cloud** | Services outside the data plane (internet, registrar, Pangolin) |
| **Data store** | Durable data (etcd, Longhorn, Loki, Vault storage) |
| **Secrets / policy** | Secret material, Vault, admission policy |
| **LB / VIP** | Load balancer, MetalLB assignment, or API VIP |
---
## Physical / node topology
Four Talos nodes on **LAN `192.168.50.0/24`**: three control planes (**neon**, **argon**, **krypton**) and one worker (**helium**). `allowSchedulingOnControlPlanes: true` in `talconfig.yaml`. The Kubernetes API is fronted by **kube-vip** on **`192.168.50.230`** (not a separate hardware load balancer).
```mermaid
flowchart TB
subgraph LAN["LAN 192.168.50.0/24"]
subgraph CP["Control planes (kube-vip VIP 192.168.50.230:6443)"]
neon["neon<br/>192.168.50.20<br/>control-plane + schedulable"]
argon["argon<br/>192.168.50.30<br/>control-plane + schedulable"]
krypton["krypton<br/>192.168.50.40<br/>control-plane + schedulable"]
end
subgraph W["Worker"]
helium["helium<br/>192.168.50.10<br/>worker only"]
end
VIP["API VIP 192.168.50.230<br/>kube-vip on ens18<br/>→ apiserver :6443"]
end
neon --- VIP
argon --- VIP
krypton --- VIP
kubectl["kubectl / talosctl clients<br/>(workstation on LAN/VPN)"] -->|"HTTPS :6443"| VIP
```
---
## Network and ingress
**Northsouth (apps on LAN):** DNS for **`*.apps.noble.lab.pcenicni.dev`** → **Traefik** **`LoadBalancer` `192.168.50.211`**. **MetalLB** L2 pool **`192.168.50.210``192.168.50.229`**; **Argo CD** uses **`192.168.50.210`**. **Public** access is not in-cluster ExternalDNS: **Newt** (Pangolin tunnel) plus **CNAME** and **Integration API** per [`clusters/noble/apps/newt/README.md`](../clusters/noble/apps/newt/README.md).
```mermaid
flowchart TB
user["User"]
subgraph DNS["DNS"]
pub["Public: CNAME → Pangolin<br/>(per Newt README; not ExternalDNS)"]
split["LAN / split horizon:<br/>*.apps.noble.lab.pcenicni.dev<br/>→ 192.168.50.211"]
end
subgraph LAN["LAN"]
ML["MetalLB L2<br/>pool 192.168.50.210229<br/>IPAddressPool noble-l2"]
T["Traefik Service LoadBalancer<br/>192.168.50.211<br/>IngressClass: traefik"]
Argo["Argo CD server LoadBalancer<br/>192.168.50.210"]
Newt["Newt (Pangolin tunnel)<br/>outbound to Pangolin"]
end
subgraph Cluster["Cluster workloads"]
Ing["Ingress resources<br/>cert-manager HTTP-01"]
App["Apps / Grafana Ingress<br/>e.g. grafana.apps.noble.lab.pcenicni.dev"]
end
user --> pub
user --> split
split --> T
pub -.->|"tunnel path"| Newt
T --> Ing --> App
ML --- T
ML --- Argo
user -->|"optional direct to LB IP"| Argo
```
---
## Platform stack (bootstrap → workloads)
Order: **Talos****Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip****Traefik**, **cert-manager****Argo CD** (Helm + app-of-apps under `clusters/noble/bootstrap/argocd/`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `vault`, `external-secrets`, `sealed-secrets`, `kyverno`, `newt`, and others as deployed.
```mermaid
flowchart TB
subgraph L0["OS / bootstrap"]
Talos["Talos v1.12.6<br/>Image Factory schematic"]
end
subgraph L1["CNI"]
Cilium["Cilium<br/>(cni: none until installed)"]
end
subgraph L2["Core add-ons"]
MS["metrics-server"]
LH["Longhorn + default StorageClass"]
MB["MetalLB + pool manifests"]
KV["kube-vip (API VIP)"]
end
subgraph L3["Ingress and TLS"]
Traefik["Traefik"]
CM["cert-manager + ClusterIssuers"]
end
subgraph L4["GitOps"]
Argo["Argo CD<br/>app-of-apps under bootstrap/argocd/"]
end
subgraph L5["Platform namespaces (examples)"]
NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, vault, external-secrets, sealed-secrets,<br/>kyverno, newt, …"]
end
Talos --> Cilium --> MS
Cilium --> LH
Cilium --> MB
Cilium --> KV
MB --> Traefik
Traefik --> CM
CM --> Argo
Argo --> NS
```
---
## Observability path
**kube-prometheus-stack** in **`monitoring`**: Prometheus, Grafana, Alertmanager, node-exporter, etc. **Loki** (SingleBinary) in **`loki`** with **Fluent Bit** in **`logging`** shipping to **`loki-gateway`**. Grafana Loki datasource is applied via **ConfigMap** [`clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml`](../clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml). Prometheus, Grafana, Alertmanager, and Loki use **Longhorn** PVCs where configured.
```mermaid
flowchart LR
subgraph Nodes["All nodes"]
NE["node-exporter DaemonSet"]
FB["Fluent Bit DaemonSet<br/>namespace: logging"]
end
subgraph mon["monitoring"]
PROM["Prometheus"]
AM["Alertmanager"]
GF["Grafana"]
SC["ServiceMonitors / kube-state-metrics / operator"]
end
subgraph lok["loki"]
LG["loki-gateway Service"]
LO["Loki SingleBinary"]
end
NE --> PROM
PROM --> GF
AM --> GF
FB -->|"to loki-gateway:80"| LG --> LO
GF -->|"Explore / datasource ConfigMap<br/>grafana-loki-datasource"| LO
subgraph PVC["Longhorn PVCs"]
P1["Prometheus / Grafana /<br/>Alertmanager PVCs"]
P2["Loki PVC"]
end
PROM --- P1
LO --- P2
```
---
## Secrets and policy
**Sealed Secrets** decrypts `SealedSecret` objects in-cluster. **External Secrets Operator** syncs from **Vault** using **`ClusterSecretStore`** (see [`examples/vault-cluster-secret-store.yaml`](../clusters/noble/apps/external-secrets/examples/vault-cluster-secret-store.yaml)). Trust is **cluster → Vault** (ESO calls Vault; Vault does not initiate cluster trust). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
```mermaid
flowchart LR
subgraph Git["Git repo"]
SSman["SealedSecret manifests<br/>(optional)"]
end
subgraph cluster["Cluster"]
SSC["Sealed Secrets controller<br/>sealed-secrets"]
ESO["External Secrets Operator<br/>external-secrets"]
V["Vault<br/>vault namespace<br/>HTTP listener"]
K["Kyverno + kyverno-policies<br/>PSS baseline Audit"]
end
SSman -->|"encrypted"| SSC -->|"decrypt to Secret"| workloads["Workload Secrets"]
ESO -->|"ClusterSecretStore →"| V
ESO -->|"sync ExternalSecret"| workloads
K -.->|"admission / audit<br/>(PSS baseline)"| workloads
```
---
## Data and storage
**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **Vault**, **kube-prometheus-stack** PVCs, and **Loki**.
```mermaid
flowchart TB
subgraph disks["Per-node Longhorn data path"]
UD["Talos user volume →<br/>/var/mnt/longhorn (bind to Longhorn paths)"]
end
subgraph LH["Longhorn"]
SC["StorageClass: longhorn (default)"]
end
subgraph consumers["Stateful / durable consumers"]
V["Vault PVC data-vault-0"]
PGL["kube-prometheus-stack PVCs"]
L["Loki PVC"]
end
UD --> SC
SC --> V
SC --> PGL
SC --> L
```
---
## Component versions
See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative checklist. Summary:
| Component | Chart / app (from CLUSTER-BUILD.md) |
|-----------|-------------------------------------|
| Talos / Kubernetes | v1.12.6 / 1.35.2 bundled |
| Cilium | Helm 1.16.6 |
| MetalLB | 0.15.3 |
| Longhorn | 1.11.1 |
| Traefik | 39.0.6 / app v3.6.11 |
| cert-manager | v1.20.0 |
| Argo CD | 9.4.17 / app v3.3.6 |
| kube-prometheus-stack | 82.15.1 |
| Loki / Fluent Bit | 6.55.0 / 0.56.0 |
| Sealed Secrets / ESO / Vault | 2.18.4 / 2.2.0 / 0.32.0 |
| Kyverno | 3.7.1 / policies 3.7.1 |
| Newt | 1.2.0 / app 1.10.1 |
---
## Narrative
The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210``192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Helm plus Argo CD** with manifests under **`clusters/noble/`** and bootstrap under **`clusters/noble/bootstrap/argocd/`**. **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, Loki, and **Vault**. **Secrets** combine **Sealed Secrets** for git-encrypted material, **Vault** with **External Secrets** for dynamic sync, and **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
---
## Assumptions and open questions
**Assumptions**
- **Hypervisor vs bare metal:** Not fixed in inventory tables; `talconfig.yaml` comments mention Proxmox virtio disk paths as examples—treat actual host platform as **TBD** unless confirmed.
- **Workstation path:** Operators reach the VIP and node IPs from the **LAN or VPN** per [`talos/README.md`](../talos/README.md).
- **Optional components** (Headlamp, Renovate, Velero, Phase G hardening) are described in CLUSTER-BUILD.md; they are not required for the diagrams above until deployed.
**Open questions**
- **Split horizon:** Confirm whether only LAN DNS resolves `*.apps.noble.lab.pcenicni.dev` to **`192.168.50.211`** or whether public resolvers also point at that address.
- **Velero / S3:** **TBD** until an S3-compatible backend is configured.
- **Argo CD:** Confirm **`repoURL`** in `root-application.yaml` and what is actually applied on-cluster.
---
*Keep in sync with [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) and manifests under [`clusters/noble/`](../clusters/noble/).*

View File

@@ -4,7 +4,7 @@ This document is the **exported TODO** for the **noble** Talos cluster (4 nodes)
## Current state (2026-03-28)
Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability) and **Phase E** (Sealed Secrets, External Secrets, **Vault** Helm install), with manifests matching this repo. **Next focus:** **Vault** `operator init` / unseal, optional **`unseal-cronjob.yaml`**, Kubernetes auth + **`ClusterSecretStore`**, optional Pangolin/sample Ingress validation, Velero when S3 exists.
Lab stack is **up** on-cluster through **Phase D** (observability), **Phase E** (Sealed Secrets, External Secrets, **Vault** + **`ClusterSecretStore`**), and **Phase F** (**Kyverno** **baseline** PSS **Audit**), with manifests matching this repo. **Next focus:** optional **Headlamp** (Ingress + TLS), **Renovate** (dependency PRs for Helm/manifests), Pangolin/sample Ingress validation, **Phase G**, **Velero** when S3 exists.
- **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
- **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/apps/cilium/`, phase 1 values).
@@ -21,7 +21,7 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1**`clusters/noble/apps/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0**`clusters/noble/apps/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
- **Vault** Helm **0.32.0** / app **1.21.2**`clusters/noble/apps/vault/` — standalone **file** storage, **Longhorn** PVC; **HTTP** listener (`global.tlsDisable`); optional **CronJob** lab unseal **`unseal-cronjob.yaml`**; **not** initialized in git — run **`vault operator init`** per **`README.md`**.
- **Still open:** Vault **Kubernetes auth** + **`ClusterSecretStore`** apply + KV for ESO; **Phase FG**; optional **sample Ingress + cert + Pangolin** end-to-end; **Velero** when S3 is ready; **Argo CD SSO**.
- **Still open:** **Headlamp** (Helm + Traefik Ingress + **`letsencrypt-prod`**); **Renovate** ([Renovate](https://docs.renovatebot.com/) — dependency bot; hosted app **or** self-hosted on-cluster); **Phase G**; optional **sample Ingress + cert + Pangolin** end-to-end; **Velero** when S3 is ready; **Argo CD SSO**.
## Inventory
@@ -42,6 +42,7 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
| Traefik (apps ingress) | `192.168.50.211`**`metallb.io/loadBalancerIPs`** in `clusters/noble/apps/traefik/values.yaml` |
| Apps ingress (LAN / split horizon) | `*.apps.noble.lab.pcenicni.dev` → Traefik LB |
| Grafana (Ingress + TLS) | **`grafana.apps.noble.lab.pcenicni.dev`** — `grafana.ingress` in `clusters/noble/apps/kube-prometheus-stack/values.yaml` (**`letsencrypt-prod`**) |
| Headlamp (Ingress + TLS) | **`headlamp.apps.noble.lab.pcenicni.dev`** — chart `ingress` in `clusters/noble/apps/headlamp/` (**`letsencrypt-prod`**, **`ingressClassName: traefik`**) |
| Public DNS (Pangolin) | **Newt** tunnel + **CNAME** at registrar + **Integration API**`clusters/noble/apps/newt/` |
| Velero | S3-compatible URL — configure later |
@@ -64,6 +65,9 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
- Sealed Secrets: **2.18.4** (Helm chart `sealed-secrets/sealed-secrets`; app **0.36.1**)
- External Secrets Operator: **2.2.0** (Helm chart `external-secrets/external-secrets`; app **v2.2.0**)
- Vault: **0.32.0** (Helm chart `hashicorp/vault`; app **1.21.2**)
- Kyverno: **3.7.1** (Helm chart `kyverno/kyverno`; app **v1.17.1**); **kyverno-policies** **3.7.1****baseline** PSS, **Audit** (`clusters/noble/apps/kyverno/`)
- Headlamp: **0.40.1** (Helm chart `headlamp/headlamp`; app matches chart — see [Artifact Hub](https://artifacthub.io/packages/helm/headlamp/headlamp))
- Renovate: **hosted** (Mend **Renovate** GitHub/GitLab app — no cluster chart) **or** **self-hosted** — pin chart when added ([Helm charts](https://docs.renovatebot.com/helm-charts/), OCI `ghcr.io/renovatebot/charts/renovate`); pair **`renovate.json`** with this repos Helm paths under **`clusters/noble/`**
## Repo paths (this workspace)
@@ -89,7 +93,10 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
| Fluent Bit → Loki (Helm values) | `clusters/noble/apps/fluent-bit/``values.yaml`, `namespace.yaml` |
| Sealed Secrets (Helm) | `clusters/noble/apps/sealed-secrets/``values.yaml`, `namespace.yaml`, `README.md` |
| External Secrets Operator (Helm + Vault store example) | `clusters/noble/apps/external-secrets/``values.yaml`, `namespace.yaml`, `README.md`, `examples/vault-cluster-secret-store.yaml` |
| Vault (Helm + optional unseal CronJob) | `clusters/noble/apps/vault/``values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `README.md` |
| Vault (Helm + optional unseal CronJob) | `clusters/noble/apps/vault/``values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `configure-kubernetes-auth.sh`, `README.md` |
| Kyverno + PSS baseline policies | `clusters/noble/apps/kyverno/``values.yaml`, `policies-values.yaml`, `namespace.yaml`, `README.md` |
| Headlamp (Helm + Ingress) | `clusters/noble/apps/headlamp/``values.yaml`, `namespace.yaml` (planned — `helm repo add headlamp https://kubernetes-sigs.github.io/headlamp/`) |
| Renovate (repo config + optional self-hosted Helm) | `renovate.json` or `renovate.json5` at repo root (see [Renovate docs](https://docs.renovatebot.com/)); optional `clusters/noble/apps/renovate/` for self-hosted chart + token Secret (**Sealed Secrets** / **ESO** after **Phase E**) |
**Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.
@@ -101,6 +108,8 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
4. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/apps/longhorn/values.yaml`.
5. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
6. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/apps/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
7. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
## Prerequisites (before phases)
@@ -141,22 +150,24 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
- [x] **Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`)
- [x] Argo CD server **LoadBalancer****`192.168.50.210`** (see `values.yaml`)
- [X] **App-of-apps** — set **`repoURL`** in **`root-application.yaml`**, add **`Application`** manifests under **`bootstrap/argocd/apps/`**, apply **`root-application.yaml`**
- [ ] **Renovate** — [Renovate](https://docs.renovatebot.com/) opens PRs for Helm charts, Docker tags, and related bumps. **Option A:** install the **Mend Renovate** app on **GitHub** / **GitLab** for this repo (no cluster). **Option B:** self-hosted — **`helm repo add renovate https://docs.renovatebot.com/helm-charts`** or OCI per [Helm charts](https://docs.renovatebot.com/helm-charts/); **`renovate.config`** with token from **Sealed Secrets** / **ESO** (**`clusters/noble/apps/renovate/`** when added). Add **`renovate.json`** (or **`renovate.json5`**) at repo root with **`packageRules`**, **`kubernetes`** / **`helm-values`** file patterns covering **`clusters/noble/`** (Helm **`values.yaml`**, manifests). Verify a dry run or first dependency PR.
- [ ] SSO — later
## Phase D — Observability
- [x] **kube-prometheus-stack**`kubectl apply -f clusters/noble/apps/kube-prometheus-stack/namespace.yaml` then **`helm upgrade --install`** as in `clusters/noble/apps/kube-prometheus-stack/values.yaml` (chart **82.15.1**); PVCs **`longhorn`**; **`--wait --timeout 30m`** recommended; verify **`kubectl -n monitoring get pods,pvc`**
- [x] **Loki** + **Fluent Bit** + **Grafana Loki datasource****order:** **`kubectl apply -f clusters/noble/apps/loki/namespace.yaml`** → **`helm upgrade --install loki`** `grafana/loki` **6.55.0** `-f clusters/noble/apps/loki/values.yaml`**`kubectl apply -f clusters/noble/apps/fluent-bit/namespace.yaml`** → **`helm upgrade --install fluent-bit`** `fluent/fluent-bit` **0.56.0** `-f clusters/noble/apps/fluent-bit/values.yaml`**`kubectl apply -f clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml`**. Verify **Explore → Loki** in Grafana; **`kubectl -n loki get pods,pvc`**, **`kubectl -n logging get pods`**
- [ ] **Headlamp** — Kubernetes web UI ([Headlamp](https://headlamp.dev/)); **`helm repo add headlamp https://kubernetes-sigs.github.io/headlamp/`**; **`kubectl apply -f clusters/noble/apps/headlamp/namespace.yaml`** → **`helm upgrade --install headlamp headlamp/headlamp --version 0.40.1 -n headlamp -f clusters/noble/apps/headlamp/values.yaml`**; **Ingress** **`https://headlamp.apps.noble.lab.pcenicni.dev`** (**`ingressClassName: traefik`**, **`cert-manager.io/cluster-issuer: letsencrypt-prod`**). **RBAC:** chart defaults are permissive — tighten before LAN-wide exposure; align with **Phase G** hardening.
## Phase E — Secrets
- [x] **Sealed Secrets** (optional Git workflow) — `clusters/noble/apps/sealed-secrets/` (Helm **2.18.4**); **`kubeseal`** + key backup per **`README.md`**
- [x] **Vault** in-cluster on Longhorn + **auto-unseal**`clusters/noble/apps/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); init/unseal/Kubernetes auth for ESO still **to do** on cluster
- [x] **Vault** in-cluster on Longhorn + **auto-unseal**`clusters/noble/apps/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); **`configure-kubernetes-auth.sh`** for ESO (**Kubernetes auth** + KV + role)
- [x] **External Secrets Operator** + Vault `ClusterSecretStore` — operator **`clusters/noble/apps/external-secrets/`** (Helm **2.2.0**); apply **`examples/vault-cluster-secret-store.yaml`** after Vault (**`README.md`**)
## Phase F — Policy + backups
- [ ] **Kyverno** baseline policies
- [x] **Kyverno** baseline policies`clusters/noble/apps/kyverno/` (Helm **kyverno** **3.7.1** + **kyverno-policies** **3.7.1**, **baseline** / **Audit** — see **`README.md`**)
- [ ] **Velero** when S3 is ready; backup/restore drill
## Phase G — Hardening
@@ -170,15 +181,18 @@ Lab stack is **up** on-cluster for bootstrap through **Phase D** (observability)
- [x] API via VIP `:6443`**`kubectl get --raw /healthz`** → **`ok`** with kubeconfig **`server:`** `https://192.168.50.230:6443`
- [x] Ingress **`LoadBalancer`** in pool `210``229` (**Traefik** → **`192.168.50.211`**)
- [x] **Argo CD** UI — **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`** (initial **`admin`** password from **`argocd-initial-admin-secret`**)
- [ ] **Renovate** — hosted app enabled for this repo **or** self-hosted workload **Running** + PRs updating **`clusters/noble/`** manifests as configured
- [ ] Sample Ingress + cert (cert-manager ready) + Pangolin resource + CNAME
- [x] PVC **`Bound`** on **Longhorn** (`storageClassName: longhorn`); Prometheus/Loki durable when configured
- [x] **`monitoring`** — **kube-prometheus-stack** core workloads **Running** (Prometheus, Grafana, Alertmanager, operator, kube-state-metrics, node-exporter); PVCs **Bound** on **longhorn**
- [x] **`loki`** — **Loki** SingleBinary + **gateway** **Running**; **`loki`** PVC **Bound** on **longhorn** (no chunks-cache by design)
- [x] **`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**)
- [x] **Grafana****Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync)
- [ ] **Headlamp** — Deployment **Running** in **`headlamp`**; UI at **`https://headlamp.apps.noble.lab.pcenicni.dev`** (TLS via **`letsencrypt-prod`**)
- [x] **`sealed-secrets`** — controller **Deployment** **Running** in **`sealed-secrets`** (install + **`kubeseal`** per **`apps/sealed-secrets/README.md`**)
- [x] **`external-secrets`** — controller + webhook + cert-controller **Running** in **`external-secrets`**; apply **`ClusterSecretStore`** after Vault **Kubernetes auth**
- [x] **`vault`** — **StatefulSet** **Running**, **`data-vault-0`** PVC **Bound** on **longhorn**; **`vault operator init`** + unseal per **`apps/vault/README.md`**
- [x] **`kyverno`** — admission / background / cleanup / reports controllers **Running** in **`kyverno`**; **ClusterPolicies** for **PSS baseline** **Ready** (**Audit**)
---