27 KiB
Noble lab — Talos cluster build checklist
This document is the exported TODO for the noble Talos cluster (4 nodes). Commands and troubleshooting live in README.md.
Current state (2026-03-28)
Lab stack is up on-cluster through Phase D–F and Phase G (Vault CiliumNetworkPolicy, talos/runbooks/). Next focus: optional Alertmanager receivers (Slack/PagerDuty); tighten RBAC (Headlamp / cluster-admin); Cilium policies for other namespaces as needed; enable Mend Renovate for PRs; Pangolin/sample Ingress; Velero backup/restore drill after S3 credentials are set (noble_velero_install).
- Talos v1.12.6 (target) / Kubernetes as bundled — four nodes Ready unless upgrading;
talosctl health;talos/kubeconfigis local only (gitignored — never commit; regenerate withtalosctl kubeconfigpertalos/README.md). Image Factory (nocloud installer):factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6 - Cilium Helm 1.16.6 / app 1.16.6 (
clusters/noble/bootstrap/cilium/, phase 1 values). - MetalLB Helm 0.15.3 / app v0.15.3; IPAddressPool
noble-l2+ L2Advertisement — pool192.168.50.210–192.168.50.229. - kube-vip DaemonSet 3/3 on control planes; VIP
192.168.50.230onens18(vip_subnet/32required — bare32breaks parsing). Verified from workstation:kubectl config set-cluster noble --server=https://192.168.50.230:6443thenkubectl get --raw /healthz→ok(talos/kubeconfig; seetalos/README.md). - metrics-server Helm 3.13.0 / app v0.8.0 —
clusters/noble/bootstrap/metrics-server/values.yaml(--kubelet-insecure-tlsfor Talos);kubectl top nodesworks. - Longhorn Helm 1.11.1 / app v1.11.1 —
clusters/noble/bootstrap/longhorn/(PSA privileged namespace,defaultDataPath/var/mnt/longhorn,preUpgradeCheckerenabled); StorageClasslonghorn(default);nodes.longhorn.ioall Ready; test PVCBoundonlonghorn. - Traefik Helm 39.0.6 / app v3.6.11 —
clusters/noble/bootstrap/traefik/;ServiceLoadBalancerEXTERNAL-IP192.168.50.211;IngressClasstraefik(default). Point*.apps.noble.lab.pcenicni.devat192.168.50.211. MetalLB pool verification was done before replacing the temporary nginx test with Traefik. - cert-manager Helm v1.20.0 / app v1.20.0 —
clusters/noble/bootstrap/cert-manager/;ClusterIssuerletsencrypt-stagingandletsencrypt-prod(DNS-01 via Cloudflare forpcenicni.dev, Secretcloudflare-dns-api-tokenincert-manager); ACME emailcertificates@noble.lab.pcenicni.dev(edit in manifests if you want a different mailbox). - Newt Helm 1.2.0 / app 1.10.1 —
clusters/noble/bootstrap/newt/(fossorial/newt); Pangolin site tunnel —newt-pangolin-authSecret (PANGOLIN_ENDPOINT,NEWT_ID,NEWT_SECRET). Prefer a SealedSecret in git (kubeseal— seeclusters/noble/bootstrap/sealed-secrets/examples/) after rotating credentials if they were exposed. Public DNS is not automated with ExternalDNS: CNAME records at your DNS host per Pangolin’s domain instructions, plus Integration API for HTTP resources/targets — seeclusters/noble/bootstrap/newt/README.md. LAN access to Traefik can still use*.apps.noble.lab.pcenicni.dev→192.168.50.211(split horizon / local resolver). - Argo CD Helm 9.4.17 / app v3.3.6 —
clusters/noble/bootstrap/argocd/;argocd-serverLoadBalancer192.168.50.210; app-of-apps root syncsclusters/noble/apps/(editroot-application.yamlrepoURLbefore applying). - kube-prometheus-stack — Helm chart 82.15.1 —
clusters/noble/bootstrap/kube-prometheus-stack/(namespacemonitoring, PSA privileged — node-exporter needs host mounts); Longhorn PVCs for Prometheus, Grafana, Alertmanager; node-exporter DaemonSet 4/4. Grafana Ingress:https://grafana.apps.noble.lab.pcenicni.dev(TraefikingressClassName: traefik,cert-manager.io/cluster-issuer: letsencrypt-prod). Loki datasource in Grafana: ConfigMapclusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml(sidecar labelgrafana_datasource: "1") — not viagrafana.additionalDataSourcesin the chart.helm upgrade --installwith--waitis silent until done — use--timeout 30m; Grafana admin: Secretkube-prometheus-grafana, keysadmin-user/admin-password. - Loki + Fluent Bit —
grafana/loki6.55.0 SingleBinary + filesystem on Longhorn (clusters/noble/bootstrap/loki/);loki.auth_enabled: false;chunksCache.enabled: false(no memcached chunk cache).fluent/fluent-bit0.56.0 →loki-gateway.loki.svc:80(clusters/noble/bootstrap/fluent-bit/);loggingPSA privileged. Grafana Explore:kubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yamlthen Explore → Loki (e.g.{job="fluent-bit"}). - Sealed Secrets Helm 2.18.4 / app 0.36.1 —
clusters/noble/bootstrap/sealed-secrets/(namespacesealed-secrets);kubesealon client should match controller minor (README); back upsealed-secrets-key(see README). - External Secrets Operator Helm 2.2.0 / app v2.2.0 —
clusters/noble/bootstrap/external-secrets/; VaultClusterSecretStoreinexamples/vault-cluster-secret-store.yaml(http://to match Vault listener — apply after Vault Kubernetes auth). - Vault Helm 0.32.0 / app 1.21.2 —
clusters/noble/bootstrap/vault/— standalone file storage, Longhorn PVC; HTTP listener (global.tlsDisable); optional CronJob lab unsealunseal-cronjob.yaml; not initialized in git — runvault operator initperREADME.md. - Velero Helm 12.0.0 / app v1.18.0 —
clusters/noble/bootstrap/velero/(Ansiblenoble_velero, not Argo); S3-compatible backup location + CSI snapshots (EnableCSI); enable withnoble_velero_installpervelero/README.md. - Still open: Renovate — install Mend Renovate (or self-host) so PRs run; optional Alertmanager notification channels; optional sample Ingress + cert + Pangolin end-to-end; Argo CD SSO.
Inventory
| Host | Role | IP |
|---|---|---|
| helium | worker | 192.168.50.10 |
| neon | control-plane + worker | 192.168.50.20 |
| argon | control-plane + worker | 192.168.50.30 |
| krypton | control-plane + worker | 192.168.50.40 |
Network reservations
| Use | Value |
|---|---|
| Kubernetes API VIP (kube-vip) | 192.168.50.230 (see talos/README.md; align with talos/talconfig.yaml additionalApiServerCertSans) |
| MetalLB L2 pool | 192.168.50.210–192.168.50.229 |
Argo CD LoadBalancer |
Pick one IP in the MetalLB pool (e.g. 192.168.50.210) |
| Traefik (apps ingress) | 192.168.50.211 — metallb.io/loadBalancerIPs in clusters/noble/bootstrap/traefik/values.yaml |
| Apps ingress (LAN / split horizon) | *.apps.noble.lab.pcenicni.dev → Traefik LB |
| Grafana (Ingress + TLS) | grafana.apps.noble.lab.pcenicni.dev — grafana.ingress in clusters/noble/bootstrap/kube-prometheus-stack/values.yaml (letsencrypt-prod) |
| Headlamp (Ingress + TLS) | headlamp.apps.noble.lab.pcenicni.dev — chart ingress in clusters/noble/bootstrap/headlamp/ (letsencrypt-prod, ingressClassName: traefik) |
| Public DNS (Pangolin) | Newt tunnel + CNAME at registrar + Integration API — clusters/noble/bootstrap/newt/ |
| Velero | S3-compatible endpoint + bucket — clusters/noble/bootstrap/velero/, ansible/playbooks/noble.yml (noble_velero_install) |
Versions
- Talos: v1.12.6 — align
talosctlclient with node image - Talos Image Factory (iscsi-tools + util-linux-tools):
factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6— same schematic must appear inmachine.install.imageaftertalhelper genconfig(bare metal may usemetal-installer/instead ofnocloud-installer/) - Kubernetes: 1.35.2 on current nodes (bundled with Talos; not pinned in repo)
- Cilium: 1.16.6 (Helm chart; see
clusters/noble/bootstrap/cilium/README.md) - MetalLB: 0.15.3 (Helm chart; app v0.15.3)
- metrics-server: 3.13.0 (Helm chart; app v0.8.0)
- Longhorn: 1.11.1 (Helm chart; app v1.11.1)
- Traefik: 39.0.6 (Helm chart; app v3.6.11)
- cert-manager: v1.20.0 (Helm chart; app v1.20.0)
- Newt (Fossorial): 1.2.0 (Helm chart; app 1.10.1)
- Argo CD: 9.4.17 (Helm chart
argo/argo-cd; app v3.3.6) - kube-prometheus-stack: 82.15.1 (Helm chart
prometheus-community/kube-prometheus-stack; app v0.89.x bundle) - Loki: 6.55.0 (Helm chart
grafana/loki; app 3.6.7) - Fluent Bit: 0.56.0 (Helm chart
fluent/fluent-bit; app 4.2.3) - Sealed Secrets: 2.18.4 (Helm chart
sealed-secrets/sealed-secrets; app 0.36.1) - External Secrets Operator: 2.2.0 (Helm chart
external-secrets/external-secrets; app v2.2.0) - Vault: 0.32.0 (Helm chart
hashicorp/vault; app 1.21.2) - Kyverno: 3.7.1 (Helm chart
kyverno/kyverno; app v1.17.1); kyverno-policies 3.7.1 — baseline PSS, Audit (clusters/noble/bootstrap/kyverno/) - Headlamp: 0.40.1 (Helm chart
headlamp/headlamp; app matches chart — see Artifact Hub) - Velero: 12.0.0 (Helm chart
vmware-tanzu/velero; app v1.18.0) —clusters/noble/bootstrap/velero/; AWS plugin v1.14.0; Ansiblenoble_velero - Renovate: hosted (Mend Renovate GitHub/GitLab app — no cluster chart) or self-hosted — pin chart when added (Helm charts, OCI
ghcr.io/renovatebot/charts/renovate); pairrenovate.jsonwith this repo’s Helm paths underclusters/noble/
Repo paths (this workspace)
| Artifact | Path |
|---|---|
| This checklist | talos/CLUSTER-BUILD.md |
| Operational runbooks (API VIP, etcd, Longhorn, Vault) | talos/runbooks/ |
| Talos quick start + networking + kubeconfig | talos/README.md |
| talhelper source (active) | talos/talconfig.yaml — may be wipe-phase (no Longhorn volume) during disk recovery |
| Longhorn volume restore | talos/talconfig.with-longhorn.yaml — copy to talconfig.yaml after GPT wipe (see talos/README.md §5) |
| Longhorn GPT wipe automation | talos/scripts/longhorn-gpt-recovery.sh |
| kube-vip (kustomize) | clusters/noble/bootstrap/kube-vip/ (vip_interface e.g. ens18) |
| Cilium (Helm values) | clusters/noble/bootstrap/cilium/ — values.yaml (phase 1), optional values-kpr.yaml, README.md |
| MetalLB | clusters/noble/bootstrap/metallb/ — namespace.yaml (PSA privileged), ip-address-pool.yaml, kustomization.yaml, README.md |
| Longhorn | clusters/noble/bootstrap/longhorn/ — values.yaml, namespace.yaml (PSA privileged), kustomization.yaml |
| metrics-server (Helm values) | clusters/noble/bootstrap/metrics-server/values.yaml |
| Traefik (Helm values) | clusters/noble/bootstrap/traefik/ — values.yaml, namespace.yaml, README.md |
| cert-manager (Helm + ClusterIssuers) | clusters/noble/bootstrap/cert-manager/ — values.yaml, namespace.yaml, kustomization.yaml, README.md |
| Newt / Pangolin tunnel (Helm) | clusters/noble/bootstrap/newt/ — values.yaml, namespace.yaml, README.md |
| Argo CD (Helm) + optional app-of-apps | clusters/noble/bootstrap/argocd/ — values.yaml, root-application.yaml, README.md; optional Application tree in clusters/noble/apps/ |
| kube-prometheus-stack (Helm values) | clusters/noble/bootstrap/kube-prometheus-stack/ — values.yaml, namespace.yaml |
| Grafana Loki datasource (ConfigMap; no chart change) | clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml |
| Loki (Helm values) | clusters/noble/bootstrap/loki/ — values.yaml, namespace.yaml |
| Fluent Bit → Loki (Helm values) | clusters/noble/bootstrap/fluent-bit/ — values.yaml, namespace.yaml |
| Sealed Secrets (Helm) | clusters/noble/bootstrap/sealed-secrets/ — values.yaml, namespace.yaml, README.md |
| External Secrets Operator (Helm + Vault store example) | clusters/noble/bootstrap/external-secrets/ — values.yaml, namespace.yaml, README.md, examples/vault-cluster-secret-store.yaml |
| Vault (Helm + optional unseal CronJob) | clusters/noble/bootstrap/vault/ — values.yaml, namespace.yaml, unseal-cronjob.yaml, cilium-network-policy.yaml, configure-kubernetes-auth.sh, README.md |
| Kyverno + PSS baseline policies | clusters/noble/bootstrap/kyverno/ — values.yaml, policies-values.yaml, namespace.yaml, README.md |
| Headlamp (Helm + Ingress) | clusters/noble/bootstrap/headlamp/ — values.yaml, namespace.yaml, README.md |
| Velero (Helm + S3 BSL; CSI snapshots) | clusters/noble/bootstrap/velero/ — values.yaml, namespace.yaml, README.md; ansible/roles/noble_velero |
| Renovate (repo config + optional self-hosted Helm) | renovate.json at repo root; optional self-hosted chart under clusters/noble/apps/ (Argo) + token Secret (Sealed Secrets / ESO after Phase E) |
Git vs cluster: manifests and talconfig live in git; talhelper genconfig -o out, bootstrap, Helm, and kubectl run on your LAN. See talos/README.md for workstation reachability (lab LAN/VPN), talosctl kubeconfig vs Kubernetes server: (VIP vs node IP), and --insecure only in maintenance.
Ordering (do not skip)
- Talos installed; Cilium (or chosen CNI) before most workloads — with
cni: none, nodes stay NotReady / network-unavailable taint until CNI is up. - MetalLB Helm chart (CRDs + controller) before
kubectl apply -kon the pool manifests. clusters/noble/bootstrap/metallb/namespace.yamlbefore or merged ontometallb-systemso Pod Security does not block speaker (seebootstrap/metallb/README.md).- Longhorn: Talos user volume + extensions in
talconfig.with-longhorn.yaml(when restored); HelmdefaultDataPathinclusters/noble/bootstrap/longhorn/values.yaml. - Loki → Fluent Bit → Grafana datasource: deploy Loki (
loki-gatewayService) before Fluent Bit; applyclusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yamlafter Loki (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki). - Vault: Longhorn default StorageClass before
clusters/noble/bootstrap/vault/Helm (PVCdata-vault-0); External SecretsClusterSecretStoreafter Vault is initialized, unsealed, and Kubernetes auth is configured. - Headlamp: Traefik + cert-manager (
letsencrypt-prod) before exposingheadlamp.apps.noble.lab.pcenicni.dev; treat as cluster-admin UI — protect with network policy / SSO when hardening (Phase G). - Renovate: Git remote + platform access (hosted app needs org/repo install; self-hosted needs
RENOVATE_TOKENand chartrenovate.config). If the bot runs in-cluster, add the token after Sealed Secrets / Vault (Phase E) — no ingress required for the bot itself. - Velero: S3-compatible endpoint + bucket +
velero/velero-cloud-credentialsbeforeansible/playbooks/noble.ymlwithnoble_velero_install: true; for CSI volume snapshots, label a VolumeSnapshotClass perclusters/noble/bootstrap/velero/README.md(e.g. Longhorn).
Prerequisites (before phases)
talos/talconfig.yamlchecked in (VIP, API SANs,cni: none,iscsi-tools/util-linux-toolsin schematic) — runtalhelper validate talconfig talconfig.yamlafter edits- Workstation on a routable path to node IPs or VIP (same LAN / VPN);
talos/README.md§3 ifkubectlhits wrongserver:ornetwork is unreachable talosctlclient matches node Talos version;talhelperforgenconfig- Node static IPs (helium, neon, argon, krypton)
- DHCP does not lease
192.168.50.210–229,230, or node IPs - DNS for API and apps as in
talos/README.md - Git remote ready for Argo CD (argo-cd)
talos/kubeconfigfromtalosctl kubeconfig— root repokubeconfigis a stub until populated
Phase A — Talos bootstrap + API VIP
- Optional: Ansible runs the same steps —
ansible/playbooks/talos_phase_a.yml(genconfig → apply → bootstrap → kubeconfig) oransible/playbooks/deploy.yml(Phase A +noble.yml); seeansible/README.md. talhelper gensecret→talhelper genconfig -o out(re-rungenconfigafter everytalconfigedit)apply-configall nodes (talos/README.md§2 — no--insecureafter nodes join; useTALOSCONFIG)talosctl bootstraponce; other control planes and worker jointalosctl kubeconfig→ workingkubectl(talos/README.md§3 — overrideserver:if VIP not reachable from workstation)- kube-vip manifests in
clusters/noble/bootstrap/kube-vip - kube-vip healthy;
vip_interfacematches uplink (talosctl get links); VIP reachable where needed talosctl health(e.g.talosctl health -n 192.168.50.20withTALOSCONFIGset)
Phase B — Core platform
Install order: Cilium → metrics-server → Longhorn (Talos disk + Helm) → MetalLB (Helm → pool manifests) → ingress / certs / DNS as planned.
- Cilium (Helm 1.16.6) — required before MetalLB if
cni: none(clusters/noble/bootstrap/cilium/) - metrics-server — Helm 3.13.0; values in
clusters/noble/bootstrap/metrics-server/values.yaml; verifykubectl top nodes - Longhorn — Talos: user volume + kubelet mounts + extensions (
talos/README.md§5); Helm 1.11.1;kubectl apply -k clusters/noble/bootstrap/longhorn; verifynodes.longhorn.ioand test PVCBound - MetalLB — chart installed; pool + L2 from
clusters/noble/bootstrap/metallb/applied (192.168.50.210–229) ServiceLoadBalancer/ pool check — MetalLB assigns from210–229(validated before Traefik; temporary nginx test removed in favor of Traefik)- Traefik
LoadBalancerfor*.apps.noble.lab.pcenicni.dev—clusters/noble/bootstrap/traefik/;192.168.50.211 - cert-manager + ClusterIssuer (
letsencrypt-staging/letsencrypt-prod) —clusters/noble/bootstrap/cert-manager/ - Newt (Pangolin tunnel; replaces ExternalDNS for public DNS) —
clusters/noble/bootstrap/newt/—newt-pangolin-auth; CNAME + Integration API pernewt/README.md
Phase C — GitOps
- Argo CD bootstrap —
clusters/noble/bootstrap/argocd/(helm upgrade --install argocd …) — also covered byansible/playbooks/noble.yml(rolenoble_argocd) - Argo CD server LoadBalancer —
192.168.50.210(seevalues.yaml) - App-of-apps — optional;
clusters/noble/apps/kustomization.yamlis empty (core stack is Ansible-managed fromclusters/noble/bootstrap/, not Argo). SetrepoURLinroot-application.yamland addApplicationmanifests only for optional GitOps workloads — seeclusters/noble/apps/README.md - Renovate —
renovate.jsonat repo root (Renovate — Kubernetes manager forclusters/noble/**/*.yamlimage pins; grouped minor/patch PRs). Activate PRs: install Mend Renovate on the Git repo (Option A), or Option B: self-hosted chart per Helm charts + token from Sealed Secrets / ESO. Helm chart versions pinned only in comments still need manual bumps or extra regexcustomManagers— extendrenovate.jsonas needed. - SSO — later
Phase D — Observability
- kube-prometheus-stack —
kubectl apply -f clusters/noble/bootstrap/kube-prometheus-stack/namespace.yamlthenhelm upgrade --installas inclusters/noble/bootstrap/kube-prometheus-stack/values.yaml(chart 82.15.1); PVCslonghorn;--wait --timeout 30mrecommended; verifykubectl -n monitoring get pods,pvc - Loki + Fluent Bit + Grafana Loki datasource — order:
kubectl apply -f clusters/noble/bootstrap/loki/namespace.yaml→helm upgrade --install lokigrafana/loki6.55.0-f clusters/noble/bootstrap/loki/values.yaml→kubectl apply -f clusters/noble/bootstrap/fluent-bit/namespace.yaml→helm upgrade --install fluent-bitfluent/fluent-bit0.56.0-f clusters/noble/bootstrap/fluent-bit/values.yaml→kubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml. Verify Explore → Loki in Grafana;kubectl -n loki get pods,pvc,kubectl -n logging get pods - Headlamp — Kubernetes web UI (Headlamp);
helm repo add headlamp https://kubernetes-sigs.github.io/headlamp/;kubectl apply -f clusters/noble/bootstrap/headlamp/namespace.yaml→helm upgrade --install headlamp headlamp/headlamp --version 0.40.1 -n headlamp -f clusters/noble/bootstrap/headlamp/values.yaml; Ingresshttps://headlamp.apps.noble.lab.pcenicni.dev(ingressClassName: traefik,cert-manager.io/cluster-issuer: letsencrypt-prod).values.yaml:config.sessionTTL: nullworks around chart 0.40.1 / binary mismatch (headlamp#4883). RBAC: chart defaults are permissive — tighten before LAN-wide exposure; align with Phase G hardening.
Phase E — Secrets
- Sealed Secrets (optional Git workflow) —
clusters/noble/bootstrap/sealed-secrets/(Helm 2.18.4);kubeseal+ key backup perREADME.md - Vault in-cluster on Longhorn + auto-unseal —
clusters/noble/bootstrap/vault/(Helm 0.32.0); Longhorn PVC; OSS “auto-unseal” = optionalunseal-cronjob.yaml+ Secret (README);configure-kubernetes-auth.shfor ESO (Kubernetes auth + KV + role) - External Secrets Operator + Vault
ClusterSecretStore— operatorclusters/noble/bootstrap/external-secrets/(Helm 2.2.0); applyexamples/vault-cluster-secret-store.yamlafter Vault (README.md)
Phase F — Policy + backups
- Kyverno baseline policies —
clusters/noble/bootstrap/kyverno/(Helm kyverno 3.7.1 + kyverno-policies 3.7.1, baseline / Audit — seeREADME.md) - Velero — manifests + Ansible
noble_velero(clusters/noble/bootstrap/velero/); enable withnoble_velero_install: true+ S3 bucket/URL +velero/velero-cloud-credentials(seevelero/README.md); optional backup/restore drill
Phase G — Hardening
- Cilium — Vault
CiliumNetworkPolicy(clusters/noble/bootstrap/vault/cilium-network-policy.yaml) — HTTP 8200 fromexternal-secrets+vault; extend for other clients as needed - Runbooks —
talos/runbooks/(API VIP / kube-vip, etcd–Talos, Longhorn, Vault) - RBAC — Headlamp
ClusterRoleBindinguses built-inedit(notcluster-admin); Argo CDpolicy.default: role:readonlywithg, admin, role:admin— seeclusters/noble/bootstrap/headlamp/values.yaml,clusters/noble/bootstrap/argocd/values.yaml,talos/runbooks/rbac.md - Alertmanager — add
slack_configs,pagerduty_configs, or other receivers underkube-prometheus-stackalertmanager.config(chart defaults usenullreceiver)
Quick validation
kubectl get nodes— all Ready- API via VIP
:6443—kubectl get --raw /healthz→okwith kubeconfigserver:https://192.168.50.230:6443 - Ingress
LoadBalancerin pool210–229(Traefik →192.168.50.211) - Argo CD UI —
argocd-serverLoadBalancer192.168.50.210(initialadminpassword fromargocd-initial-admin-secret) - Renovate —
renovate.jsoncommitted; enable Mend Renovate or self-hosted bot for PRs - Sample Ingress + cert (cert-manager ready) + Pangolin resource + CNAME
- PVC
Boundon Longhorn (storageClassName: longhorn); Prometheus/Loki durable when configured monitoring— kube-prometheus-stack core workloads Running (Prometheus, Grafana, Alertmanager, operator, kube-state-metrics, node-exporter); PVCs Bound on longhornloki— Loki SingleBinary + gateway Running;lokiPVC Bound on longhorn (no chunks-cache by design)logging— Fluent Bit DaemonSet Running on all nodes (logs → Loki)- Grafana — Loki datasource from
grafana-loki-datasourceConfigMap (Explore works after apply + sidecar sync) - Headlamp — Deployment Running in
headlamp; UI athttps://headlamp.apps.noble.lab.pcenicni.dev(TLS vialetsencrypt-prod) sealed-secrets— controller Deployment Running insealed-secrets(install +kubesealperapps/sealed-secrets/README.md)external-secrets— controller + webhook + cert-controller Running inexternal-secrets; applyClusterSecretStoreafter Vault Kubernetes authvault— StatefulSet Running,data-vault-0PVC Bound on longhorn;vault operator init+ unseal perapps/vault/README.mdkyverno— admission / background / cleanup / reports controllers Running inkyverno; ClusterPolicies for PSS baseline Ready (Audit)velero— when enabled: Deployment Running invelero;BackupStorageLocation/VolumeSnapshotLocationAvailable; test backup pervelero/README.md- Phase G (partial) — Vault
CiliumNetworkPolicy;talos/runbooks/(incl. RBAC); Headlamp/Argo CD RBAC tightened — Alertmanager receivers still optional
Keep in sync with talos/README.md and manifests under clusters/noble/.