Files
home-server/talos/CLUSTER-BUILD.md

27 KiB
Raw Blame History

Noble lab — Talos cluster build checklist

This document is the exported TODO for the noble Talos cluster (4 nodes). Commands and troubleshooting live in README.md.

Current state (2026-03-28)

Lab stack is up on-cluster through Phase DF and Phase G (Vault CiliumNetworkPolicy, talos/runbooks/). Next focus: optional Alertmanager receivers (Slack/PagerDuty); tighten RBAC (Headlamp / cluster-admin); Cilium policies for other namespaces as needed; enable Mend Renovate for PRs; Pangolin/sample Ingress; Velero backup/restore drill after S3 credentials are set (noble_velero_install).

  • Talos v1.12.6 (target) / Kubernetes as bundled — four nodes Ready unless upgrading; talosctl health; talos/kubeconfig is local only (gitignored — never commit; regenerate with talosctl kubeconfig per talos/README.md). Image Factory (nocloud installer): factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6
  • Cilium Helm 1.16.6 / app 1.16.6 (clusters/noble/bootstrap/cilium/, phase 1 values).
  • MetalLB Helm 0.15.3 / app v0.15.3; IPAddressPool noble-l2 + L2Advertisement — pool 192.168.50.210192.168.50.229.
  • kube-vip DaemonSet 3/3 on control planes; VIP 192.168.50.230 on ens18 (vip_subnet /32 required — bare 32 breaks parsing). Verified from workstation: kubectl config set-cluster noble --server=https://192.168.50.230:6443 then kubectl get --raw /healthzok (talos/kubeconfig; see talos/README.md).
  • metrics-server Helm 3.13.0 / app v0.8.0clusters/noble/bootstrap/metrics-server/values.yaml (--kubelet-insecure-tls for Talos); kubectl top nodes works.
  • Longhorn Helm 1.11.1 / app v1.11.1clusters/noble/bootstrap/longhorn/ (PSA privileged namespace, defaultDataPath /var/mnt/longhorn, preUpgradeChecker enabled); StorageClass longhorn (default); nodes.longhorn.io all Ready; test PVC Bound on longhorn.
  • Traefik Helm 39.0.6 / app v3.6.11clusters/noble/bootstrap/traefik/; Service LoadBalancer EXTERNAL-IP 192.168.50.211; IngressClass traefik (default). Point *.apps.noble.lab.pcenicni.dev at 192.168.50.211. MetalLB pool verification was done before replacing the temporary nginx test with Traefik.
  • cert-manager Helm v1.20.0 / app v1.20.0clusters/noble/bootstrap/cert-manager/; ClusterIssuer letsencrypt-staging and letsencrypt-prod (DNS-01 via Cloudflare for pcenicni.dev, Secret cloudflare-dns-api-token in cert-manager); ACME email certificates@noble.lab.pcenicni.dev (edit in manifests if you want a different mailbox).
  • Newt Helm 1.2.0 / app 1.10.1clusters/noble/bootstrap/newt/ (fossorial/newt); Pangolin site tunnel — newt-pangolin-auth Secret (PANGOLIN_ENDPOINT, NEWT_ID, NEWT_SECRET). Prefer a SealedSecret in git (kubeseal — see clusters/noble/bootstrap/sealed-secrets/examples/) after rotating credentials if they were exposed. Public DNS is not automated with ExternalDNS: CNAME records at your DNS host per Pangolins domain instructions, plus Integration API for HTTP resources/targets — see clusters/noble/bootstrap/newt/README.md. LAN access to Traefik can still use *.apps.noble.lab.pcenicni.dev192.168.50.211 (split horizon / local resolver).
  • Argo CD Helm 9.4.17 / app v3.3.6clusters/noble/bootstrap/argocd/; argocd-server LoadBalancer 192.168.50.210; app-of-apps root syncs clusters/noble/apps/ (edit root-application.yaml repoURL before applying).
  • kube-prometheus-stack — Helm chart 82.15.1clusters/noble/bootstrap/kube-prometheus-stack/ (namespace monitoring, PSA privilegednode-exporter needs host mounts); Longhorn PVCs for Prometheus, Grafana, Alertmanager; node-exporter DaemonSet 4/4. Grafana Ingress: https://grafana.apps.noble.lab.pcenicni.dev (Traefik ingressClassName: traefik, cert-manager.io/cluster-issuer: letsencrypt-prod). Loki datasource in Grafana: ConfigMap clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml (sidecar label grafana_datasource: "1") — not via grafana.additionalDataSources in the chart. helm upgrade --install with --wait is silent until done — use --timeout 30m; Grafana admin: Secret kube-prometheus-grafana, keys admin-user / admin-password.
  • Loki + Fluent Bitgrafana/loki 6.55.0 SingleBinary + filesystem on Longhorn (clusters/noble/bootstrap/loki/); loki.auth_enabled: false; chunksCache.enabled: false (no memcached chunk cache). fluent/fluent-bit 0.56.0loki-gateway.loki.svc:80 (clusters/noble/bootstrap/fluent-bit/); logging PSA privileged. Grafana Explore: kubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml then Explore → Loki (e.g. {job="fluent-bit"}).
  • Sealed Secrets Helm 2.18.4 / app 0.36.1clusters/noble/bootstrap/sealed-secrets/ (namespace sealed-secrets); kubeseal on client should match controller minor (README); back up sealed-secrets-key (see README).
  • External Secrets Operator Helm 2.2.0 / app v2.2.0clusters/noble/bootstrap/external-secrets/; Vault ClusterSecretStore in examples/vault-cluster-secret-store.yaml (http:// to match Vault listener — apply after Vault Kubernetes auth).
  • Vault Helm 0.32.0 / app 1.21.2clusters/noble/bootstrap/vault/ — standalone file storage, Longhorn PVC; HTTP listener (global.tlsDisable); optional CronJob lab unseal unseal-cronjob.yaml; not initialized in git — run vault operator init per README.md.
  • Velero Helm 12.0.0 / app v1.18.0clusters/noble/bootstrap/velero/ (Ansible noble_velero, not Argo); S3-compatible backup location + CSI snapshots (EnableCSI); enable with noble_velero_install per velero/README.md.
  • Still open: Renovate — install Mend Renovate (or self-host) so PRs run; optional Alertmanager notification channels; optional sample Ingress + cert + Pangolin end-to-end; Argo CD SSO.

Inventory

Host Role IP
helium worker 192.168.50.10
neon control-plane + worker 192.168.50.20
argon control-plane + worker 192.168.50.30
krypton control-plane + worker 192.168.50.40

Network reservations

Use Value
Kubernetes API VIP (kube-vip) 192.168.50.230 (see talos/README.md; align with talos/talconfig.yaml additionalApiServerCertSans)
MetalLB L2 pool 192.168.50.210192.168.50.229
Argo CD LoadBalancer Pick one IP in the MetalLB pool (e.g. 192.168.50.210)
Traefik (apps ingress) 192.168.50.211metallb.io/loadBalancerIPs in clusters/noble/bootstrap/traefik/values.yaml
Apps ingress (LAN / split horizon) *.apps.noble.lab.pcenicni.dev → Traefik LB
Grafana (Ingress + TLS) grafana.apps.noble.lab.pcenicni.devgrafana.ingress in clusters/noble/bootstrap/kube-prometheus-stack/values.yaml (letsencrypt-prod)
Headlamp (Ingress + TLS) headlamp.apps.noble.lab.pcenicni.dev — chart ingress in clusters/noble/bootstrap/headlamp/ (letsencrypt-prod, ingressClassName: traefik)
Public DNS (Pangolin) Newt tunnel + CNAME at registrar + Integration APIclusters/noble/bootstrap/newt/
Velero S3-compatible endpoint + bucket — clusters/noble/bootstrap/velero/, ansible/playbooks/noble.yml (noble_velero_install)

Versions

  • Talos: v1.12.6 — align talosctl client with node image
  • Talos Image Factory (iscsi-tools + util-linux-tools): factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6 — same schematic must appear in machine.install.image after talhelper genconfig (bare metal may use metal-installer/ instead of nocloud-installer/)
  • Kubernetes: 1.35.2 on current nodes (bundled with Talos; not pinned in repo)
  • Cilium: 1.16.6 (Helm chart; see clusters/noble/bootstrap/cilium/README.md)
  • MetalLB: 0.15.3 (Helm chart; app v0.15.3)
  • metrics-server: 3.13.0 (Helm chart; app v0.8.0)
  • Longhorn: 1.11.1 (Helm chart; app v1.11.1)
  • Traefik: 39.0.6 (Helm chart; app v3.6.11)
  • cert-manager: v1.20.0 (Helm chart; app v1.20.0)
  • Newt (Fossorial): 1.2.0 (Helm chart; app 1.10.1)
  • Argo CD: 9.4.17 (Helm chart argo/argo-cd; app v3.3.6)
  • kube-prometheus-stack: 82.15.1 (Helm chart prometheus-community/kube-prometheus-stack; app v0.89.x bundle)
  • Loki: 6.55.0 (Helm chart grafana/loki; app 3.6.7)
  • Fluent Bit: 0.56.0 (Helm chart fluent/fluent-bit; app 4.2.3)
  • Sealed Secrets: 2.18.4 (Helm chart sealed-secrets/sealed-secrets; app 0.36.1)
  • External Secrets Operator: 2.2.0 (Helm chart external-secrets/external-secrets; app v2.2.0)
  • Vault: 0.32.0 (Helm chart hashicorp/vault; app 1.21.2)
  • Kyverno: 3.7.1 (Helm chart kyverno/kyverno; app v1.17.1); kyverno-policies 3.7.1baseline PSS, Audit (clusters/noble/bootstrap/kyverno/)
  • Headlamp: 0.40.1 (Helm chart headlamp/headlamp; app matches chart — see Artifact Hub)
  • Velero: 12.0.0 (Helm chart vmware-tanzu/velero; app v1.18.0) — clusters/noble/bootstrap/velero/; AWS plugin v1.14.0; Ansible noble_velero
  • Renovate: hosted (Mend Renovate GitHub/GitLab app — no cluster chart) or self-hosted — pin chart when added (Helm charts, OCI ghcr.io/renovatebot/charts/renovate); pair renovate.json with this repos Helm paths under clusters/noble/

Repo paths (this workspace)

Artifact Path
This checklist talos/CLUSTER-BUILD.md
Operational runbooks (API VIP, etcd, Longhorn, Vault) talos/runbooks/
Talos quick start + networking + kubeconfig talos/README.md
talhelper source (active) talos/talconfig.yaml — may be wipe-phase (no Longhorn volume) during disk recovery
Longhorn volume restore talos/talconfig.with-longhorn.yaml — copy to talconfig.yaml after GPT wipe (see talos/README.md §5)
Longhorn GPT wipe automation talos/scripts/longhorn-gpt-recovery.sh
kube-vip (kustomize) clusters/noble/bootstrap/kube-vip/ (vip_interface e.g. ens18)
Cilium (Helm values) clusters/noble/bootstrap/cilium/values.yaml (phase 1), optional values-kpr.yaml, README.md
MetalLB clusters/noble/bootstrap/metallb/namespace.yaml (PSA privileged), ip-address-pool.yaml, kustomization.yaml, README.md
Longhorn clusters/noble/bootstrap/longhorn/values.yaml, namespace.yaml (PSA privileged), kustomization.yaml
metrics-server (Helm values) clusters/noble/bootstrap/metrics-server/values.yaml
Traefik (Helm values) clusters/noble/bootstrap/traefik/values.yaml, namespace.yaml, README.md
cert-manager (Helm + ClusterIssuers) clusters/noble/bootstrap/cert-manager/values.yaml, namespace.yaml, kustomization.yaml, README.md
Newt / Pangolin tunnel (Helm) clusters/noble/bootstrap/newt/values.yaml, namespace.yaml, README.md
Argo CD (Helm) + optional app-of-apps clusters/noble/bootstrap/argocd/values.yaml, root-application.yaml, README.md; optional Application tree in clusters/noble/apps/
kube-prometheus-stack (Helm values) clusters/noble/bootstrap/kube-prometheus-stack/values.yaml, namespace.yaml
Grafana Loki datasource (ConfigMap; no chart change) clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml
Loki (Helm values) clusters/noble/bootstrap/loki/values.yaml, namespace.yaml
Fluent Bit → Loki (Helm values) clusters/noble/bootstrap/fluent-bit/values.yaml, namespace.yaml
Sealed Secrets (Helm) clusters/noble/bootstrap/sealed-secrets/values.yaml, namespace.yaml, README.md
External Secrets Operator (Helm + Vault store example) clusters/noble/bootstrap/external-secrets/values.yaml, namespace.yaml, README.md, examples/vault-cluster-secret-store.yaml
Vault (Helm + optional unseal CronJob) clusters/noble/bootstrap/vault/values.yaml, namespace.yaml, unseal-cronjob.yaml, cilium-network-policy.yaml, configure-kubernetes-auth.sh, README.md
Kyverno + PSS baseline policies clusters/noble/bootstrap/kyverno/values.yaml, policies-values.yaml, namespace.yaml, README.md
Headlamp (Helm + Ingress) clusters/noble/bootstrap/headlamp/values.yaml, namespace.yaml, README.md
Velero (Helm + S3 BSL; CSI snapshots) clusters/noble/bootstrap/velero/values.yaml, namespace.yaml, README.md; ansible/roles/noble_velero
Renovate (repo config + optional self-hosted Helm) renovate.json at repo root; optional self-hosted chart under clusters/noble/apps/ (Argo) + token Secret (Sealed Secrets / ESO after Phase E)

Git vs cluster: manifests and talconfig live in git; talhelper genconfig -o out, bootstrap, Helm, and kubectl run on your LAN. See talos/README.md for workstation reachability (lab LAN/VPN), talosctl kubeconfig vs Kubernetes server: (VIP vs node IP), and --insecure only in maintenance.

Ordering (do not skip)

  1. Talos installed; Cilium (or chosen CNI) before most workloads — with cni: none, nodes stay NotReady / network-unavailable taint until CNI is up.
  2. MetalLB Helm chart (CRDs + controller) before kubectl apply -k on the pool manifests.
  3. clusters/noble/bootstrap/metallb/namespace.yaml before or merged onto metallb-system so Pod Security does not block speaker (see bootstrap/metallb/README.md).
  4. Longhorn: Talos user volume + extensions in talconfig.with-longhorn.yaml (when restored); Helm defaultDataPath in clusters/noble/bootstrap/longhorn/values.yaml.
  5. Loki → Fluent Bit → Grafana datasource: deploy Loki (loki-gateway Service) before Fluent Bit; apply clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml after Loki (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
  6. Vault: Longhorn default StorageClass before clusters/noble/bootstrap/vault/ Helm (PVC data-vault-0); External Secrets ClusterSecretStore after Vault is initialized, unsealed, and Kubernetes auth is configured.
  7. Headlamp: Traefik + cert-manager (letsencrypt-prod) before exposing headlamp.apps.noble.lab.pcenicni.dev; treat as cluster-admin UI — protect with network policy / SSO when hardening (Phase G).
  8. Renovate: Git remote + platform access (hosted app needs org/repo install; self-hosted needs RENOVATE_TOKEN and chart renovate.config). If the bot runs in-cluster, add the token after Sealed Secrets / Vault (Phase E) — no ingress required for the bot itself.
  9. Velero: S3-compatible endpoint + bucket + velero/velero-cloud-credentials before ansible/playbooks/noble.yml with noble_velero_install: true; for CSI volume snapshots, label a VolumeSnapshotClass per clusters/noble/bootstrap/velero/README.md (e.g. Longhorn).

Prerequisites (before phases)

  • talos/talconfig.yaml checked in (VIP, API SANs, cni: none, iscsi-tools / util-linux-tools in schematic) — run talhelper validate talconfig talconfig.yaml after edits
  • Workstation on a routable path to node IPs or VIP (same LAN / VPN); talos/README.md §3 if kubectl hits wrong server: or network is unreachable
  • talosctl client matches node Talos version; talhelper for genconfig
  • Node static IPs (helium, neon, argon, krypton)
  • DHCP does not lease 192.168.50.210229, 230, or node IPs
  • DNS for API and apps as in talos/README.md
  • Git remote ready for Argo CD (argo-cd)
  • talos/kubeconfig from talosctl kubeconfig — root repo kubeconfig is a stub until populated

Phase A — Talos bootstrap + API VIP

  • Optional: Ansible runs the same steps — ansible/playbooks/talos_phase_a.yml (genconfig → apply → bootstrap → kubeconfig) or ansible/playbooks/deploy.yml (Phase A + noble.yml); see ansible/README.md.
  • talhelper gensecrettalhelper genconfig -o out (re-run genconfig after every talconfig edit)
  • apply-config all nodes (talos/README.md §2 — no --insecure after nodes join; use TALOSCONFIG)
  • talosctl bootstrap once; other control planes and worker join
  • talosctl kubeconfig → working kubectl (talos/README.md §3 — override server: if VIP not reachable from workstation)
  • kube-vip manifests in clusters/noble/bootstrap/kube-vip
  • kube-vip healthy; vip_interface matches uplink (talosctl get links); VIP reachable where needed
  • talosctl health (e.g. talosctl health -n 192.168.50.20 with TALOSCONFIG set)

Phase B — Core platform

Install order: Ciliummetrics-serverLonghorn (Talos disk + Helm) → MetalLB (Helm → pool manifests) → ingress / certs / DNS as planned.

  • Cilium (Helm 1.16.6) — required before MetalLB if cni: none (clusters/noble/bootstrap/cilium/)
  • metrics-server — Helm 3.13.0; values in clusters/noble/bootstrap/metrics-server/values.yaml; verify kubectl top nodes
  • Longhorn — Talos: user volume + kubelet mounts + extensions (talos/README.md §5); Helm 1.11.1; kubectl apply -k clusters/noble/bootstrap/longhorn; verify nodes.longhorn.io and test PVC Bound
  • MetalLB — chart installed; pool + L2 from clusters/noble/bootstrap/metallb/ applied (192.168.50.210229)
  • Service LoadBalancer / pool check — MetalLB assigns from 210229 (validated before Traefik; temporary nginx test removed in favor of Traefik)
  • Traefik LoadBalancer for *.apps.noble.lab.pcenicni.devclusters/noble/bootstrap/traefik/; 192.168.50.211
  • cert-manager + ClusterIssuer (letsencrypt-staging / letsencrypt-prod) — clusters/noble/bootstrap/cert-manager/
  • Newt (Pangolin tunnel; replaces ExternalDNS for public DNS) — clusters/noble/bootstrap/newt/newt-pangolin-auth; CNAME + Integration API per newt/README.md

Phase C — GitOps

  • Argo CD bootstrap — clusters/noble/bootstrap/argocd/ (helm upgrade --install argocd …) — also covered by ansible/playbooks/noble.yml (role noble_argocd)
  • Argo CD server LoadBalancer192.168.50.210 (see values.yaml)
  • App-of-apps — optional; clusters/noble/apps/kustomization.yaml is empty (core stack is Ansible-managed from clusters/noble/bootstrap/, not Argo). Set repoURL in root-application.yaml and add Application manifests only for optional GitOps workloads — see clusters/noble/apps/README.md
  • Renovaterenovate.json at repo root (RenovateKubernetes manager for clusters/noble/**/*.yaml image pins; grouped minor/patch PRs). Activate PRs: install Mend Renovate on the Git repo (Option A), or Option B: self-hosted chart per Helm charts + token from Sealed Secrets / ESO. Helm chart versions pinned only in comments still need manual bumps or extra regex customManagers — extend renovate.json as needed.
  • SSO — later

Phase D — Observability

  • kube-prometheus-stackkubectl apply -f clusters/noble/bootstrap/kube-prometheus-stack/namespace.yaml then helm upgrade --install as in clusters/noble/bootstrap/kube-prometheus-stack/values.yaml (chart 82.15.1); PVCs longhorn; --wait --timeout 30m recommended; verify kubectl -n monitoring get pods,pvc
  • Loki + Fluent Bit + Grafana Loki datasourceorder: kubectl apply -f clusters/noble/bootstrap/loki/namespace.yamlhelm upgrade --install loki grafana/loki 6.55.0 -f clusters/noble/bootstrap/loki/values.yamlkubectl apply -f clusters/noble/bootstrap/fluent-bit/namespace.yamlhelm upgrade --install fluent-bit fluent/fluent-bit 0.56.0 -f clusters/noble/bootstrap/fluent-bit/values.yamlkubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml. Verify Explore → Loki in Grafana; kubectl -n loki get pods,pvc, kubectl -n logging get pods
  • Headlamp — Kubernetes web UI (Headlamp); helm repo add headlamp https://kubernetes-sigs.github.io/headlamp/; kubectl apply -f clusters/noble/bootstrap/headlamp/namespace.yamlhelm upgrade --install headlamp headlamp/headlamp --version 0.40.1 -n headlamp -f clusters/noble/bootstrap/headlamp/values.yaml; Ingress https://headlamp.apps.noble.lab.pcenicni.dev (ingressClassName: traefik, cert-manager.io/cluster-issuer: letsencrypt-prod). values.yaml: config.sessionTTL: null works around chart 0.40.1 / binary mismatch (headlamp#4883). RBAC: chart defaults are permissive — tighten before LAN-wide exposure; align with Phase G hardening.

Phase E — Secrets

  • Sealed Secrets (optional Git workflow) — clusters/noble/bootstrap/sealed-secrets/ (Helm 2.18.4); kubeseal + key backup per README.md
  • Vault in-cluster on Longhorn + auto-unsealclusters/noble/bootstrap/vault/ (Helm 0.32.0); Longhorn PVC; OSS “auto-unseal” = optional unseal-cronjob.yaml + Secret (README); configure-kubernetes-auth.sh for ESO (Kubernetes auth + KV + role)
  • External Secrets Operator + Vault ClusterSecretStore — operator clusters/noble/bootstrap/external-secrets/ (Helm 2.2.0); apply examples/vault-cluster-secret-store.yaml after Vault (README.md)

Phase F — Policy + backups

  • Kyverno baseline policies — clusters/noble/bootstrap/kyverno/ (Helm kyverno 3.7.1 + kyverno-policies 3.7.1, baseline / Audit — see README.md)
  • Velero — manifests + Ansible noble_velero (clusters/noble/bootstrap/velero/); enable with noble_velero_install: true + S3 bucket/URL + velero/velero-cloud-credentials (see velero/README.md); optional backup/restore drill

Phase G — Hardening

  • Cilium — Vault CiliumNetworkPolicy (clusters/noble/bootstrap/vault/cilium-network-policy.yaml) — HTTP 8200 from external-secrets + vault; extend for other clients as needed
  • Runbookstalos/runbooks/ (API VIP / kube-vip, etcdTalos, Longhorn, Vault)
  • RBACHeadlamp ClusterRoleBinding uses built-in edit (not cluster-admin); Argo CD policy.default: role:readonly with g, admin, role:admin — see clusters/noble/bootstrap/headlamp/values.yaml, clusters/noble/bootstrap/argocd/values.yaml, talos/runbooks/rbac.md
  • Alertmanager — add slack_configs, pagerduty_configs, or other receivers under kube-prometheus-stack alertmanager.config (chart defaults use null receiver)

Quick validation

  • kubectl get nodes — all Ready
  • API via VIP :6443kubectl get --raw /healthzok with kubeconfig server: https://192.168.50.230:6443
  • Ingress LoadBalancer in pool 210229 (Traefik192.168.50.211)
  • Argo CD UI — argocd-server LoadBalancer 192.168.50.210 (initial admin password from argocd-initial-admin-secret)
  • Renovaterenovate.json committed; enable Mend Renovate or self-hosted bot for PRs
  • Sample Ingress + cert (cert-manager ready) + Pangolin resource + CNAME
  • PVC Bound on Longhorn (storageClassName: longhorn); Prometheus/Loki durable when configured
  • monitoringkube-prometheus-stack core workloads Running (Prometheus, Grafana, Alertmanager, operator, kube-state-metrics, node-exporter); PVCs Bound on longhorn
  • lokiLoki SingleBinary + gateway Running; loki PVC Bound on longhorn (no chunks-cache by design)
  • loggingFluent Bit DaemonSet Running on all nodes (logs → Loki)
  • GrafanaLoki datasource from grafana-loki-datasource ConfigMap (Explore works after apply + sidecar sync)
  • Headlamp — Deployment Running in headlamp; UI at https://headlamp.apps.noble.lab.pcenicni.dev (TLS via letsencrypt-prod)
  • sealed-secrets — controller Deployment Running in sealed-secrets (install + kubeseal per apps/sealed-secrets/README.md)
  • external-secrets — controller + webhook + cert-controller Running in external-secrets; apply ClusterSecretStore after Vault Kubernetes auth
  • vaultStatefulSet Running, data-vault-0 PVC Bound on longhorn; vault operator init + unseal per apps/vault/README.md
  • kyverno — admission / background / cleanup / reports controllers Running in kyverno; ClusterPolicies for PSS baseline Ready (Audit)
  • velero — when enabled: Deployment Running in velero; BackupStorageLocation / VolumeSnapshotLocation Available; test backup per velero/README.md
  • Phase G (partial) — Vault CiliumNetworkPolicy; talos/runbooks/ (incl. RBAC); Headlamp/Argo CD RBAC tightened — Alertmanager receivers still optional

Keep in sync with talos/README.md and manifests under clusters/noble/.