# Noble platform architecture This document describes the **noble** Talos lab cluster: node topology, networking, platform stack, observability, secrets/policy, and storage. Facts align with [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md), [`talos/talconfig.yaml`](../talos/talconfig.yaml), and manifests under [`clusters/noble/`](../clusters/noble/). ## Legend | Shape / style | Meaning | |---------------|---------| | **Subgraph “Cluster”** | Kubernetes cluster boundary (`noble`) | | **External / DNS / cloud** | Services outside the data plane (internet, registrar, Pangolin) | | **Data store** | Durable data (etcd, Longhorn, Loki, Vault storage) | | **Secrets / policy** | Secret material, Vault, admission policy | | **LB / VIP** | Load balancer, MetalLB assignment, or API VIP | --- ## Physical / node topology Four Talos nodes on **LAN `192.168.50.0/24`**: three control planes (**neon**, **argon**, **krypton**) and one worker (**helium**). `allowSchedulingOnControlPlanes: true` in `talconfig.yaml`. The Kubernetes API is fronted by **kube-vip** on **`192.168.50.230`** (not a separate hardware load balancer). ```mermaid flowchart TB subgraph LAN["LAN 192.168.50.0/24"] subgraph CP["Control planes (kube-vip VIP 192.168.50.230:6443)"] neon["neon
192.168.50.20
control-plane + schedulable"] argon["argon
192.168.50.30
control-plane + schedulable"] krypton["krypton
192.168.50.40
control-plane + schedulable"] end subgraph W["Worker"] helium["helium
192.168.50.10
worker only"] end VIP["API VIP 192.168.50.230
kube-vip on ens18
→ apiserver :6443"] end neon --- VIP argon --- VIP krypton --- VIP kubectl["kubectl / talosctl clients
(workstation on LAN/VPN)"] -->|"HTTPS :6443"| VIP ``` --- ## Network and ingress **North–south (apps on LAN):** DNS for **`*.apps.noble.lab.pcenicni.dev`** → **Traefik** **`LoadBalancer` `192.168.50.211`**. **MetalLB** L2 pool **`192.168.50.210`–`192.168.50.229`**; **Argo CD** uses **`192.168.50.210`**. **Public** access is not in-cluster ExternalDNS: **Newt** (Pangolin tunnel) plus **CNAME** and **Integration API** per [`clusters/noble/bootstrap/newt/README.md`](../clusters/noble/bootstrap/newt/README.md). ```mermaid flowchart TB user["User"] subgraph DNS["DNS"] pub["Public: CNAME → Pangolin
(per Newt README; not ExternalDNS)"] split["LAN / split horizon:
*.apps.noble.lab.pcenicni.dev
→ 192.168.50.211"] end subgraph LAN["LAN"] ML["MetalLB L2
pool 192.168.50.210–229
IPAddressPool noble-l2"] T["Traefik Service LoadBalancer
192.168.50.211
IngressClass: traefik"] Argo["Argo CD server LoadBalancer
192.168.50.210"] Newt["Newt (Pangolin tunnel)
outbound to Pangolin"] end subgraph Cluster["Cluster workloads"] Ing["Ingress resources
cert-manager HTTP-01"] App["Apps / Grafana Ingress
e.g. grafana.apps.noble.lab.pcenicni.dev"] end user --> pub user --> split split --> T pub -.->|"tunnel path"| Newt T --> Ing --> App ML --- T ML --- Argo user -->|"optional direct to LB IP"| Argo ``` --- ## Platform stack (bootstrap → workloads) Order: **Talos** → **Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip** → **Traefik**, **cert-manager** → **Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `vault`, `external-secrets`, `sealed-secrets`, `kyverno`, `newt`, and others as deployed. ```mermaid flowchart TB subgraph L0["OS / bootstrap"] Talos["Talos v1.12.6
Image Factory schematic"] end subgraph L1["CNI"] Cilium["Cilium
(cni: none until installed)"] end subgraph L2["Core add-ons"] MS["metrics-server"] LH["Longhorn + default StorageClass"] MB["MetalLB + pool manifests"] KV["kube-vip (API VIP)"] end subgraph L3["Ingress and TLS"] Traefik["Traefik"] CM["cert-manager + ClusterIssuers"] end subgraph L4["GitOps"] Argo["Argo CD
(optional app-of-apps; platform via Ansible)"] end subgraph L5["Platform namespaces (examples)"] NS["cert-manager, traefik, metallb-system,
longhorn-system, monitoring, loki, logging,
argocd, vault, external-secrets, sealed-secrets,
kyverno, newt, …"] end Talos --> Cilium --> MS Cilium --> LH Cilium --> MB Cilium --> KV MB --> Traefik Traefik --> CM CM --> Argo Argo --> NS ``` --- ## Observability path **kube-prometheus-stack** in **`monitoring`**: Prometheus, Grafana, Alertmanager, node-exporter, etc. **Loki** (SingleBinary) in **`loki`** with **Fluent Bit** in **`logging`** shipping to **`loki-gateway`**. Grafana Loki datasource is applied via **ConfigMap** [`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`](../clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml). Prometheus, Grafana, Alertmanager, and Loki use **Longhorn** PVCs where configured. ```mermaid flowchart LR subgraph Nodes["All nodes"] NE["node-exporter DaemonSet"] FB["Fluent Bit DaemonSet
namespace: logging"] end subgraph mon["monitoring"] PROM["Prometheus"] AM["Alertmanager"] GF["Grafana"] SC["ServiceMonitors / kube-state-metrics / operator"] end subgraph lok["loki"] LG["loki-gateway Service"] LO["Loki SingleBinary"] end NE --> PROM PROM --> GF AM --> GF FB -->|"to loki-gateway:80"| LG --> LO GF -->|"Explore / datasource ConfigMap
grafana-loki-datasource"| LO subgraph PVC["Longhorn PVCs"] P1["Prometheus / Grafana /
Alertmanager PVCs"] P2["Loki PVC"] end PROM --- P1 LO --- P2 ``` --- ## Secrets and policy **Sealed Secrets** decrypts `SealedSecret` objects in-cluster. **External Secrets Operator** syncs from **Vault** using **`ClusterSecretStore`** (see [`examples/vault-cluster-secret-store.yaml`](../clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml)). Trust is **cluster → Vault** (ESO calls Vault; Vault does not initiate cluster trust). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**. ```mermaid flowchart LR subgraph Git["Git repo"] SSman["SealedSecret manifests
(optional)"] end subgraph cluster["Cluster"] SSC["Sealed Secrets controller
sealed-secrets"] ESO["External Secrets Operator
external-secrets"] V["Vault
vault namespace
HTTP listener"] K["Kyverno + kyverno-policies
PSS baseline Audit"] end SSman -->|"encrypted"| SSC -->|"decrypt to Secret"| workloads["Workload Secrets"] ESO -->|"ClusterSecretStore →"| V ESO -->|"sync ExternalSecret"| workloads K -.->|"admission / audit
(PSS baseline)"| workloads ``` --- ## Data and storage **StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **Vault**, **kube-prometheus-stack** PVCs, and **Loki**. ```mermaid flowchart TB subgraph disks["Per-node Longhorn data path"] UD["Talos user volume →
/var/mnt/longhorn (bind to Longhorn paths)"] end subgraph LH["Longhorn"] SC["StorageClass: longhorn (default)"] end subgraph consumers["Stateful / durable consumers"] V["Vault PVC data-vault-0"] PGL["kube-prometheus-stack PVCs"] L["Loki PVC"] end UD --> SC SC --> V SC --> PGL SC --> L ``` --- ## Component versions See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative checklist. Summary: | Component | Chart / app (from CLUSTER-BUILD.md) | |-----------|-------------------------------------| | Talos / Kubernetes | v1.12.6 / 1.35.2 bundled | | Cilium | Helm 1.16.6 | | MetalLB | 0.15.3 | | Longhorn | 1.11.1 | | Traefik | 39.0.6 / app v3.6.11 | | cert-manager | v1.20.0 | | Argo CD | 9.4.17 / app v3.3.6 | | kube-prometheus-stack | 82.15.1 | | Loki / Fluent Bit | 6.55.0 / 0.56.0 | | Sealed Secrets / ESO / Vault | 2.18.4 / 2.2.0 / 0.32.0 | | Kyverno | 3.7.1 / policies 3.7.1 | | Newt | 1.2.0 / app 1.10.1 | --- ## Narrative The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210`–`192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, Loki, and **Vault**. **Secrets** combine **Sealed Secrets** for git-encrypted material, **Vault** with **External Secrets** for dynamic sync, and **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS. --- ## Assumptions and open questions **Assumptions** - **Hypervisor vs bare metal:** Not fixed in inventory tables; `talconfig.yaml` comments mention Proxmox virtio disk paths as examples—treat actual host platform as **TBD** unless confirmed. - **Workstation path:** Operators reach the VIP and node IPs from the **LAN or VPN** per [`talos/README.md`](../talos/README.md). - **Optional components** (Headlamp, Renovate, Velero, Phase G hardening) are described in CLUSTER-BUILD.md; they are not required for the diagrams above until deployed. **Open questions** - **Split horizon:** Confirm whether only LAN DNS resolves `*.apps.noble.lab.pcenicni.dev` to **`192.168.50.211`** or whether public resolvers also point at that address. - **Velero / S3:** **TBD** until an S3-compatible backend is configured. - **Argo CD:** Confirm **`repoURL`** in `root-application.yaml` and what is actually applied on-cluster. --- *Keep in sync with [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) and manifests under [`clusters/noble/`](../clusters/noble/).*