Files
home-server/docs/architecture.md

10 KiB
Raw Blame History

Noble platform architecture

This document describes the noble Talos lab cluster: node topology, networking, platform stack, observability, secrets/policy, and storage. Facts align with talos/CLUSTER-BUILD.md, talos/talconfig.yaml, and manifests under clusters/noble/.

Legend

Shape / style Meaning
Subgraph “Cluster” Kubernetes cluster boundary (noble)
External / DNS / cloud Services outside the data plane (internet, registrar, Pangolin)
Data store Durable data (etcd, Longhorn, Loki, Vault storage)
Secrets / policy Secret material, Vault, admission policy
LB / VIP Load balancer, MetalLB assignment, or API VIP

Physical / node topology

Four Talos nodes on LAN 192.168.50.0/24: three control planes (neon, argon, krypton) and one worker (helium). allowSchedulingOnControlPlanes: true in talconfig.yaml. The Kubernetes API is fronted by kube-vip on 192.168.50.230 (not a separate hardware load balancer).

flowchart TB
  subgraph LAN["LAN 192.168.50.0/24"]
    subgraph CP["Control planes (kube-vip VIP 192.168.50.230:6443)"]
      neon["neon<br/>192.168.50.20<br/>control-plane + schedulable"]
      argon["argon<br/>192.168.50.30<br/>control-plane + schedulable"]
      krypton["krypton<br/>192.168.50.40<br/>control-plane + schedulable"]
    end
    subgraph W["Worker"]
      helium["helium<br/>192.168.50.10<br/>worker only"]
    end
    VIP["API VIP 192.168.50.230<br/>kube-vip on ens18<br/>→ apiserver :6443"]
  end
  neon --- VIP
  argon --- VIP
  krypton --- VIP
  kubectl["kubectl / talosctl clients<br/>(workstation on LAN/VPN)"] -->|"HTTPS :6443"| VIP

Network and ingress

Northsouth (apps on LAN): DNS for *.apps.noble.lab.pcenicni.devTraefik LoadBalancer 192.168.50.211. MetalLB L2 pool 192.168.50.210192.168.50.229; Argo CD uses 192.168.50.210. Public access is not in-cluster ExternalDNS: Newt (Pangolin tunnel) plus CNAME and Integration API per clusters/noble/apps/newt/README.md.

flowchart TB
  user["User"]
  subgraph DNS["DNS"]
    pub["Public: CNAME → Pangolin<br/>(per Newt README; not ExternalDNS)"]
    split["LAN / split horizon:<br/>*.apps.noble.lab.pcenicni.dev<br/>→ 192.168.50.211"]
  end
  subgraph LAN["LAN"]
    ML["MetalLB L2<br/>pool 192.168.50.210229<br/>IPAddressPool noble-l2"]
    T["Traefik Service LoadBalancer<br/>192.168.50.211<br/>IngressClass: traefik"]
    Argo["Argo CD server LoadBalancer<br/>192.168.50.210"]
    Newt["Newt (Pangolin tunnel)<br/>outbound to Pangolin"]
  end
  subgraph Cluster["Cluster workloads"]
    Ing["Ingress resources<br/>cert-manager HTTP-01"]
    App["Apps / Grafana Ingress<br/>e.g. grafana.apps.noble.lab.pcenicni.dev"]
  end
  user --> pub
  user --> split
  split --> T
  pub -.->|"tunnel path"| Newt
  T --> Ing --> App
  ML --- T
  ML --- Argo
  user -->|"optional direct to LB IP"| Argo

Platform stack (bootstrap → workloads)

Order: TalosCilium (cluster uses cni: none until CNI is installed) → metrics-server, Longhorn, MetalLB + pool manifests, kube-vipTraefik, cert-managerArgo CD (Helm + app-of-apps under clusters/noble/bootstrap/argocd/). Platform namespaces include cert-manager, traefik, metallb-system, longhorn-system, monitoring, loki, logging, argocd, vault, external-secrets, sealed-secrets, kyverno, newt, and others as deployed.

flowchart TB
  subgraph L0["OS / bootstrap"]
    Talos["Talos v1.12.6<br/>Image Factory schematic"]
  end
  subgraph L1["CNI"]
    Cilium["Cilium<br/>(cni: none until installed)"]
  end
  subgraph L2["Core add-ons"]
    MS["metrics-server"]
    LH["Longhorn + default StorageClass"]
    MB["MetalLB + pool manifests"]
    KV["kube-vip (API VIP)"]
  end
  subgraph L3["Ingress and TLS"]
    Traefik["Traefik"]
    CM["cert-manager + ClusterIssuers"]
  end
  subgraph L4["GitOps"]
    Argo["Argo CD<br/>app-of-apps under bootstrap/argocd/"]
  end
  subgraph L5["Platform namespaces (examples)"]
    NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, vault, external-secrets, sealed-secrets,<br/>kyverno, newt, …"]
  end
  Talos --> Cilium --> MS
  Cilium --> LH
  Cilium --> MB
  Cilium --> KV
  MB --> Traefik
  Traefik --> CM
  CM --> Argo
  Argo --> NS

Observability path

kube-prometheus-stack in monitoring: Prometheus, Grafana, Alertmanager, node-exporter, etc. Loki (SingleBinary) in loki with Fluent Bit in logging shipping to loki-gateway. Grafana Loki datasource is applied via ConfigMap clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml. Prometheus, Grafana, Alertmanager, and Loki use Longhorn PVCs where configured.

flowchart LR
  subgraph Nodes["All nodes"]
    NE["node-exporter DaemonSet"]
    FB["Fluent Bit DaemonSet<br/>namespace: logging"]
  end
  subgraph mon["monitoring"]
    PROM["Prometheus"]
    AM["Alertmanager"]
    GF["Grafana"]
    SC["ServiceMonitors / kube-state-metrics / operator"]
  end
  subgraph lok["loki"]
    LG["loki-gateway Service"]
    LO["Loki SingleBinary"]
  end
  NE --> PROM
  PROM --> GF
  AM --> GF
  FB -->|"to loki-gateway:80"| LG --> LO
  GF -->|"Explore / datasource ConfigMap<br/>grafana-loki-datasource"| LO
  subgraph PVC["Longhorn PVCs"]
    P1["Prometheus / Grafana /<br/>Alertmanager PVCs"]
    P2["Loki PVC"]
  end
  PROM --- P1
  LO --- P2

Secrets and policy

Sealed Secrets decrypts SealedSecret objects in-cluster. External Secrets Operator syncs from Vault using ClusterSecretStore (see examples/vault-cluster-secret-store.yaml). Trust is cluster → Vault (ESO calls Vault; Vault does not initiate cluster trust). Kyverno with kyverno-policies enforces PSS baseline in Audit.

flowchart LR
  subgraph Git["Git repo"]
    SSman["SealedSecret manifests<br/>(optional)"]
  end
  subgraph cluster["Cluster"]
    SSC["Sealed Secrets controller<br/>sealed-secrets"]
    ESO["External Secrets Operator<br/>external-secrets"]
    V["Vault<br/>vault namespace<br/>HTTP listener"]
    K["Kyverno + kyverno-policies<br/>PSS baseline Audit"]
  end
  SSman -->|"encrypted"| SSC -->|"decrypt to Secret"| workloads["Workload Secrets"]
  ESO -->|"ClusterSecretStore →"| V
  ESO -->|"sync ExternalSecret"| workloads
  K -.->|"admission / audit<br/>(PSS baseline)"| workloads

Data and storage

StorageClass: longhorn (default). Talos mounts user volume data at /var/mnt/longhorn (bind paths for Longhorn). Stateful consumers include Vault, kube-prometheus-stack PVCs, and Loki.

flowchart TB
  subgraph disks["Per-node Longhorn data path"]
    UD["Talos user volume →<br/>/var/mnt/longhorn (bind to Longhorn paths)"]
  end
  subgraph LH["Longhorn"]
    SC["StorageClass: longhorn (default)"]
  end
  subgraph consumers["Stateful / durable consumers"]
    V["Vault PVC data-vault-0"]
    PGL["kube-prometheus-stack PVCs"]
    L["Loki PVC"]
  end
  UD --> SC
  SC --> V
  SC --> PGL
  SC --> L

Component versions

See talos/CLUSTER-BUILD.md for the authoritative checklist. Summary:

Component Chart / app (from CLUSTER-BUILD.md)
Talos / Kubernetes v1.12.6 / 1.35.2 bundled
Cilium Helm 1.16.6
MetalLB 0.15.3
Longhorn 1.11.1
Traefik 39.0.6 / app v3.6.11
cert-manager v1.20.0
Argo CD 9.4.17 / app v3.3.6
kube-prometheus-stack 82.15.1
Loki / Fluent Bit 6.55.0 / 0.56.0
Sealed Secrets / ESO / Vault 2.18.4 / 2.2.0 / 0.32.0
Kyverno 3.7.1 / policies 3.7.1
Newt 1.2.0 / app 1.10.1

Narrative

The noble environment is a Talos lab cluster on 192.168.50.0/24 with three control plane nodes and one worker, schedulable workloads on control planes enabled, and the Kubernetes API exposed through kube-vip at 192.168.50.230. Cilium provides the CNI after Talos bootstrap with cni: none; MetalLB advertises 192.168.50.210192.168.50.229, pinning Argo CD to 192.168.50.210 and Traefik to 192.168.50.211 for *.apps.noble.lab.pcenicni.dev. cert-manager issues certificates for Traefik Ingresses; GitOps is Helm plus Argo CD with manifests under clusters/noble/ and bootstrap under clusters/noble/bootstrap/argocd/. Observability uses kube-prometheus-stack in monitoring, Loki and Fluent Bit with Grafana wired via a ConfigMap datasource, with Longhorn PVCs for Prometheus, Grafana, Alertmanager, Loki, and Vault. Secrets combine Sealed Secrets for git-encrypted material, Vault with External Secrets for dynamic sync, and Kyverno enforces Pod Security Standards baseline in Audit. Public access uses Newt to Pangolin with CNAME and Integration API steps as documented—not generic in-cluster public DNS.


Assumptions and open questions

Assumptions

  • Hypervisor vs bare metal: Not fixed in inventory tables; talconfig.yaml comments mention Proxmox virtio disk paths as examples—treat actual host platform as TBD unless confirmed.
  • Workstation path: Operators reach the VIP and node IPs from the LAN or VPN per talos/README.md.
  • Optional components (Headlamp, Renovate, Velero, Phase G hardening) are described in CLUSTER-BUILD.md; they are not required for the diagrams above until deployed.

Open questions

  • Split horizon: Confirm whether only LAN DNS resolves *.apps.noble.lab.pcenicni.dev to 192.168.50.211 or whether public resolvers also point at that address.
  • Velero / S3: TBD until an S3-compatible backend is configured.
  • Argo CD: Confirm repoURL in root-application.yaml and what is actually applied on-cluster.

Keep in sync with talos/CLUSTER-BUILD.md and manifests under clusters/noble/.