Refactor noble cluster configurations to transition from the deprecated apps structure to a streamlined bootstrap approach. Update paths in various YAML files and README documentation to reflect the new organization under clusters/noble/bootstrap. This change enhances clarity and consistency across the deployment process, ensuring that all components are correctly referenced and documented for user guidance.
- **kube-vip** DaemonSet **3/3** on control planes; VIP **`192.168.50.230`** on **`ens18`** (`vip_subnet`**`/32`** required — bare **`32`** breaks parsing). **Verified from workstation:**`kubectl config set-cluster noble --server=https://192.168.50.230:6443` then **`kubectl get --raw /healthz`** → **`ok`** (`talos/kubeconfig`; see `talos/README.md`).
- **metrics-server** Helm **3.13.0** / app **v0.8.0** — `clusters/noble/apps/metrics-server/values.yaml` (`--kubelet-insecure-tls` for Talos); **`kubectl top nodes`** works.
- **Longhorn** Helm **1.11.1** / app **v1.11.1** — `clusters/noble/apps/longhorn/` (PSA **privileged** namespace, `defaultDataPath``/var/mnt/longhorn`, `preUpgradeChecker` enabled); **StorageClass**`longhorn` (default); **`nodes.longhorn.io`** all **Ready**; test **PVC**`Bound` on `longhorn`.
- **Traefik** Helm **39.0.6** / app **v3.6.11** — `clusters/noble/apps/traefik/`; **`Service`** **`LoadBalancer`** **`EXTERNAL-IP``192.168.50.211`**; **`IngressClass`** **`traefik`** (default). Point **`*.apps.noble.lab.pcenicni.dev`** at **`192.168.50.211`**. MetalLB pool verification was done before replacing the temporary nginx test with Traefik.
- **cert-manager** Helm **v1.20.0** / app **v1.20.0** — `clusters/noble/apps/cert-manager/`; **`ClusterIssuer`** **`letsencrypt-staging`** and **`letsencrypt-prod`** (**DNS-01** via **Cloudflare** for **`pcenicni.dev`**, Secret **`cloudflare-dns-api-token`** in **`cert-manager`**); ACME email **`certificates@noble.lab.pcenicni.dev`** (edit in manifests if you want a different mailbox).
- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/apps/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Prefer a **SealedSecret** in git (`kubeseal` — see `clusters/noble/apps/sealed-secrets/examples/`) after rotating credentials if they were exposed. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/apps/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1** — `clusters/noble/apps/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0** — `clusters/noble/apps/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
- **metrics-server** Helm **3.13.0** / app **v0.8.0** — `clusters/noble/bootstrap/metrics-server/values.yaml` (`--kubelet-insecure-tls` for Talos); **`kubectl top nodes`** works.
- **Longhorn** Helm **1.11.1** / app **v1.11.1** — `clusters/noble/bootstrap/longhorn/` (PSA **privileged** namespace, `defaultDataPath``/var/mnt/longhorn`, `preUpgradeChecker` enabled); **StorageClass**`longhorn` (default); **`nodes.longhorn.io`** all **Ready**; test **PVC**`Bound` on `longhorn`.
- **Traefik** Helm **39.0.6** / app **v3.6.11** — `clusters/noble/bootstrap/traefik/`; **`Service`** **`LoadBalancer`** **`EXTERNAL-IP``192.168.50.211`**; **`IngressClass`** **`traefik`** (default). Point **`*.apps.noble.lab.pcenicni.dev`** at **`192.168.50.211`**. MetalLB pool verification was done before replacing the temporary nginx test with Traefik.
- **cert-manager** Helm **v1.20.0** / app **v1.20.0** — `clusters/noble/bootstrap/cert-manager/`; **`ClusterIssuer`** **`letsencrypt-staging`** and **`letsencrypt-prod`** (**DNS-01** via **Cloudflare** for **`pcenicni.dev`**, Secret **`cloudflare-dns-api-token`** in **`cert-manager`**); ACME email **`certificates@noble.lab.pcenicni.dev`** (edit in manifests if you want a different mailbox).
- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Prefer a **SealedSecret** in git (`kubeseal` — see `clusters/noble/bootstrap/sealed-secrets/examples/`) after rotating credentials if they were exposed. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1** — `clusters/noble/bootstrap/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0** — `clusters/noble/bootstrap/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
| Public DNS (Pangolin) | **Newt** tunnel + **CNAME** at registrar + **Integration API** — `clusters/noble/bootstrap/newt/` |
| Velero | S3-compatible URL — configure later |
## Versions
@@ -51,7 +51,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
- Talos: **v1.12.6** — align `talosctl` client with node image
- Talos **Image Factory** (iscsi-tools + util-linux-tools): **`factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`** — same schematic must appear in **`machine.install.image`** after `talhelper genconfig` (bare metal may use `metal-installer/` instead of `nocloud-installer/`)
- Kubernetes: **1.35.2** on current nodes (bundled with Talos; not pinned in repo)
- Cilium: **1.16.6** (Helm chart; see `clusters/noble/apps/cilium/README.md`)
- Cilium: **1.16.6** (Helm chart; see `clusters/noble/bootstrap/cilium/README.md`)
| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (**Sealed Secrets** / **ESO** after **Phase E**) |
**Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.
@@ -105,10 +105,10 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
1.**Talos** installed; **Cilium** (or chosen CNI) **before** most workloads — with `cni: none`, nodes stay **NotReady** / **network-unavailable** taint until CNI is up.
2.**MetalLB Helm chart** (CRDs + controller) **before**`kubectl apply -k` on the pool manifests.
3.**`clusters/noble/apps/metallb/namespace.yaml`** before or merged onto `metallb-system` so Pod Security does not block speaker (see `apps/metallb/README.md`).
4.**Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/apps/longhorn/values.yaml`.
5.**Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/apps/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
6.**Vault:****Longhorn** default **StorageClass** before **`clusters/noble/apps/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets****`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
3.**`clusters/noble/bootstrap/metallb/namespace.yaml`** before or merged onto `metallb-system` so Pod Security does not block speaker (see `bootstrap/metallb/README.md`).
4.**Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
5.**Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
6.**Vault:****Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets****`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
7.**Headlamp:****Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
8.**Renovate:****Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after****Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
@@ -130,7 +130,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
- [x]`apply-config` all nodes (`talos/README.md` §2 — **no**`--insecure` after nodes join; use `TALOSCONFIG`)
- [x]`talosctl bootstrap` once; other control planes and worker join
- [x]`talosctl kubeconfig` → working `kubectl` (`talos/README.md` §3 — override `server:` if VIP not reachable from workstation)
- [x]**kube-vip manifests** in `clusters/noble/apps/kube-vip`
- [x]**kube-vip manifests** in `clusters/noble/bootstrap/kube-vip`
- [x] kube-vip healthy; `vip_interface` matches uplink (`talosctl get links`); VIP reachable where needed
- [x]`talosctl health` (e.g. `talosctl health -n 192.168.50.20` with `TALOSCONFIG` set)
@@ -138,45 +138,45 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
**Install order:****Cilium** → **metrics-server** → **Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned.
- [x]**Cilium** (Helm **1.16.6**) — **required** before MetalLB if `cni: none` (`clusters/noble/apps/cilium/`)
- [x]**metrics-server** — Helm **3.13.0**; values in `clusters/noble/apps/metrics-server/values.yaml`; verify `kubectl top nodes`
- [x]**Longhorn** — Talos: user volume + kubelet mounts + extensions (`talos/README.md` §5); Helm **1.11.1**; `kubectl apply -k clusters/noble/apps/longhorn`; verify **`nodes.longhorn.io`** and test PVC **`Bound`**
- [x]**`Service``LoadBalancer`** / pool check — MetalLB assigns from `210`–`229` (validated before Traefik; temporary nginx test removed in favor of Traefik)
- [x]**Traefik**`LoadBalancer` for `*.apps.noble.lab.pcenicni.dev` — `clusters/noble/apps/traefik/`; **`192.168.50.211`**
- [x]**Newt** (Pangolin tunnel; replaces ExternalDNS for public DNS) — `clusters/noble/bootstrap/newt/` — **`newt-pangolin-auth`**; CNAME + **Integration API** per **`newt/README.md`**
## Phase C — GitOps
- [x]**Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`) — also covered by **`ansible/playbooks/noble.yml`** (role **`noble_argocd`**)
- [x] Argo CD server **LoadBalancer** — **`192.168.50.210`** (see `values.yaml`)
- [x]**App-of-apps** — optional; **`apps/kustomization.yaml`** is **empty** (core stack is **Ansible**-managed, not Argo). Set **`repoURL`** in **`root-application.yaml`** and add **`Application`** manifests only for optional GitOps workloads — see **`bootstrap/argocd/apps/README.md`**
- [x]**App-of-apps** — optional; **`clusters/noble/apps/kustomization.yaml`** is **empty** (core stack is **Ansible**-managed from **`clusters/noble/bootstrap/`**, not Argo). Set **`repoURL`** in **`root-application.yaml`** and add **`Application`** manifests only for optional GitOps workloads — see **`clusters/noble/apps/README.md`**
- [x]**Renovate** — **`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **Sealed Secrets** / **ESO**. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex**`customManagers` — extend **`renovate.json`** as needed.
- [ ] SSO — later
## Phase D — Observability
- [x]**kube-prometheus-stack** — `kubectl apply -f clusters/noble/apps/kube-prometheus-stack/namespace.yaml` then **`helm upgrade --install`** as in `clusters/noble/apps/kube-prometheus-stack/values.yaml` (chart **82.15.1**); PVCs **`longhorn`**; **`--wait --timeout 30m`** recommended; verify **`kubectl -n monitoring get pods,pvc`**
- [ ]**Velero** when S3 is ready; backup/restore drill
## Phase G — Hardening
- [x]**Cilium** — Vault **`CiliumNetworkPolicy`** (`clusters/noble/apps/vault/cilium-network-policy.yaml`) — HTTP **8200** from **`external-secrets`** + **`vault`**; extend for other clients as needed
- [x]**Cilium** — Vault **`CiliumNetworkPolicy`** (`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`) — HTTP **8200** from **`external-secrets`** + **`vault`**; extend for other clients as needed
- [ ]**Alertmanager** — add **`slack_configs`**, **`pagerduty_configs`**, or other receivers under **`kube-prometheus-stack`** `alertmanager.config` (chart defaults use **`null`** receiver)
@@ -106,7 +106,7 @@ sed -i '' 's|https://192.168.50.230:6443|https://192.168.50.20:6443|g' kubeconfi
Quick check from your Mac: `nc -vz 192.168.50.20 50000` (Talos) and `nc -vz 192.168.50.20 6443` (Kubernetes).
**`dial tcp 192.168.50.230:6443` on nodes:** Host-network components (including **Cilium**) cannot use the in-cluster `kubernetes` Service; they otherwise follow **`cluster.controlPlane.endpoint`** (the VIP). Talos **KubePrism** on **`127.0.0.1:7445`** (default) load-balances to healthy apiservers. Ensure the CNI Helm values set **`k8sServiceHost: "127.0.0.1"`** and **`k8sServicePort: "7445"`** — see [`clusters/noble/apps/cilium/values.yaml`](../clusters/noble/apps/cilium/values.yaml). Also confirm **kube-vip**’s **`vip_interface`** matches the uplink (`talosctl -n <ip> get links` — e.g. **`ens18`** on these nodes). A bare **`curl -k https://192.168.50.230:6443/healthz`** often returns **`401 Unauthorized`** because no client cert was sent — that still means TLS to the VIP worked.
**`dial tcp 192.168.50.230:6443` on nodes:** Host-network components (including **Cilium**) cannot use the in-cluster `kubernetes` Service; they otherwise follow **`cluster.controlPlane.endpoint`** (the VIP). Talos **KubePrism** on **`127.0.0.1:7445`** (default) load-balances to healthy apiservers. Ensure the CNI Helm values set **`k8sServiceHost: "127.0.0.1"`** and **`k8sServicePort: "7445"`** — see [`clusters/noble/bootstrap/cilium/values.yaml`](../clusters/noble/bootstrap/cilium/values.yaml). Also confirm **kube-vip**’s **`vip_interface`** matches the uplink (`talosctl -n <ip> get links` — e.g. **`ens18`** on these nodes). A bare **`curl -k https://192.168.50.230:6443/healthz`** often returns **`401 Unauthorized`** because no client cert was sent — that still means TLS to the VIP worked.
**Verify the VIP with `kubectl` (copy as-is):** use a real kubeconfig path (not ` /path/to/…`). From the **repository root**:
@@ -124,23 +124,23 @@ Expect a single line: **`ok`**. If you see **`The connection to the server local
| MetalLB pool | After MetalLB controller install: `kubectl apply -k ../clusters/noble/bootstrap/metallb` |
| Longhorn PSA + Helm | `kubectl apply -k ../clusters/noble/bootstrap/longhorn` then Helm from §5 below |
Set `vip_interface` in `clusters/noble/apps/kube-vip/vip-daemonset.yaml` if it does not match the control-plane uplink (`talosctl -n <cp-ip> get links`).
Set `vip_interface` in `clusters/noble/bootstrap/kube-vip/vip-daemonset.yaml` if it does not match the control-plane uplink (`talosctl -n <cp-ip> get links`).
## 5. Longhorn (Talos)
1.**Machine image:**`talconfig.yaml` includes `iscsi-tools` and `util-linux-tools` extensions. After `talhelper genconfig`, **upgrade each node** so the running installer image matches (extensions are in the image, not applied live by config alone). If `longhorn-manager` logs **`iscsiadm` / `open-iscsi`**, the node image does not include the extension yet.
2.**Pod Security + path:** Apply `kubectl apply -k ../clusters/noble/apps/longhorn` (privileged `longhorn-system`). The Helm chart host-mounts **`/var/lib/longhorn`**; `talconfig` adds a kubelet **bind** from `/var/mnt/longhorn` → `/var/lib/longhorn` so that path matches the dedicated XFS volume.
2.**Pod Security + path:** Apply `kubectl apply -k ../clusters/noble/bootstrap/longhorn` (privileged `longhorn-system`). The Helm chart host-mounts **`/var/lib/longhorn`**; `talconfig` adds a kubelet **bind** from `/var/mnt/longhorn` → `/var/lib/longhorn` so that path matches the dedicated XFS volume.
3.**Data path:** From the **repository root** (not `talos/`), run Helm with a real release and chart name — not literal `...`:
1. VIP and interface align with [`talos/talconfig.yaml`](../talconfig.yaml) (`cluster.network`, `additionalApiServerCertSans`) and [`clusters/noble/apps/kube-vip/`](../../clusters/noble/apps/kube-vip/).
1. VIP and interface align with [`talos/talconfig.yaml`](../talconfig.yaml) (`cluster.network`, `additionalApiServerCertSans`) and [`clusters/noble/bootstrap/kube-vip/`](../../clusters/noble/bootstrap/kube-vip/).
2.`kubectl -n kube-system get pods -l app.kubernetes.io/name=kube-vip -o wide` — DaemonSet should be **Running** on control-plane nodes.
3. From a workstation: `ping 192.168.50.230` (if ICMP allowed) and `curl -k https://192.168.50.230:6443/healthz` or `kubectl get --raw /healthz` with kubeconfig `server:` set to the VIP.
4.`talosctl health` with `TALOSCONFIG` (see [`talos/README.md`](../README.md) §3).
**Headlamp** (`clusters/noble/apps/headlamp/values.yaml`): the chart’s **ClusterRoleBinding** uses the built-in **`edit`** ClusterRole — not **`cluster-admin`**. Break-glass changes use **`kubectl`** with an admin kubeconfig.
**Headlamp** (`clusters/noble/bootstrap/headlamp/values.yaml`): the chart’s **ClusterRoleBinding** uses the built-in **`edit`** ClusterRole — not **`cluster-admin`**. Break-glass changes use **`kubectl`** with an admin kubeconfig.
**Argo CD** (`clusters/noble/bootstrap/argocd/values.yaml`): **`policy.default: role:readonly`** — new OIDC/Git users get read-only unless you add **`g, <user-or-group>, role:admin`** (or another role) in **`configs.rbac.policy.csv`**. Local user **`admin`** stays **`role:admin`** via **`g, admin, role:admin`**.
3. Kubernetes auth for ESO: [`clusters/noble/apps/vault/configure-kubernetes-auth.sh`](../../clusters/noble/apps/vault/configure-kubernetes-auth.sh) and `kubectl describe clustersecretstore vault`.
4.**Cilium** policy: if Vault is unreachable from `external-secrets`, check [`clusters/noble/apps/vault/cilium-network-policy.yaml`](../../clusters/noble/apps/vault/cilium-network-policy.yaml) and extend `ingress` for new client namespaces.
3. Kubernetes auth for ESO: [`clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`](../../clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh) and `kubectl describe clustersecretstore vault`.
4.**Cilium** policy: if Vault is unreachable from `external-secrets`, check [`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`](../../clusters/noble/bootstrap/vault/cilium-network-policy.yaml) and extend `ingress` for new client namespaces.
# (base URL only, no `:tag` — talhelper validates and appends `talosVersion`).
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.