Refactor noble cluster configurations by removing deprecated Argo CD application management files and transitioning to a streamlined Ansible-driven installation approach. Update kustomization.yaml files to reflect the new structure, ensuring clarity on resource management. Introduce new namespaces and configurations for cert-manager, external-secrets, and logging components, enhancing the overall deployment process. Add detailed README.md documentation for each component to guide users through the setup and management of the noble lab environment.

This commit is contained in:
Nikholas Pcenicni
2026-03-28 17:02:50 -04:00
parent 41841abc84
commit 90fd8fb8a6
59 changed files with 28 additions and 38 deletions

View File

@@ -0,0 +1,34 @@
# Cilium — noble (Talos)
Talos uses **`cluster.network.cni.name: none`**; you must install Cilium (or another CNI) before nodes become **Ready** and before **MetalLB** / most workloads. See `talos/CLUSTER-BUILD.md` ordering.
## 1. Install (phase 1 — required)
Uses **`values.yaml`**: IPAM **kubernetes**, **`k8sServiceHost` / `k8sServicePort`** pointing at **KubePrism** (`127.0.0.1:7445`, Talos default), Talos cgroup paths, **drop `SYS_MODULE`** from agent caps, **`bpf.masquerade: false`** ([Talos Cilium](https://www.talos.dev/latest/kubernetes-guides/network/deploying-cilium/), [KubePrism](https://www.talos.dev/latest/kubernetes-guides/configuration/kubeprism/)). Without this, host-network CNI clients may **`dial tcp <VIP>:6443`** and fail if the VIP path is unhealthy.
From **repository root**:
```bash
helm repo add cilium https://helm.cilium.io/
helm repo update
helm upgrade --install cilium cilium/cilium \
--namespace kube-system \
--version 1.16.6 \
-f clusters/noble/apps/cilium/values.yaml \
--wait
```
Verify:
```bash
kubectl -n kube-system rollout status ds/cilium
kubectl get nodes
```
When nodes are **Ready**, continue with **MetalLB** (`clusters/noble/apps/metallb/README.md`) and other Phase B items. **kube-vip** for the Kubernetes API VIP is separate (L2 ARP); it can run after the API is reachable.
## 2. Optional: kube-proxy replacement (phase 2)
To replace **`kube-proxy`** with Cilium entirely, use **`values-kpr.yaml`** and **`cluster.proxy.disabled: true`** in Talos on every node (see comments inside `values-kpr.yaml`). Follow the upstream [Deploy Cilium CNI](https://www.talos.dev/latest/kubernetes-guides/network/deploying-cilium/) section *without kube-proxy*.
Do **not** skip phase 1 unless you already know your cluster matches the “bootstrap window” flow from the Talos docs.

View File

@@ -0,0 +1,49 @@
# Optional phase 2: kube-proxy replacement via Cilium + KubePrism (Talos apid forwards :7445 → :6443).
# Prerequisites:
# 1. Phase 1 Cilium installed and healthy; nodes Ready.
# 2. Add to Talos machine config on ALL nodes:
# cluster:
# proxy:
# disabled: true
# (keep cluster.network.cni.name: none). Regenerate, apply-config, reboot as needed.
# 3. Remove legacy kube-proxy objects if still present:
# kubectl delete ds -n kube-system kube-proxy --ignore-not-found
# kubectl delete cm -n kube-system kube-proxy --ignore-not-found
# 4. helm upgrade cilium ... -f values-kpr.yaml
#
# Ref: https://www.talos.dev/latest/kubernetes-guides/network/deploying-cilium/
ipam:
mode: kubernetes
kubeProxyReplacement: "true"
k8sServiceHost: localhost
k8sServicePort: "7445"
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
cgroup:
autoMount:
enabled: false
hostRoot: /sys/fs/cgroup
bpf:
masquerade: false

View File

@@ -0,0 +1,44 @@
# Cilium on Talos — phase 1: bring up CNI while kube-proxy still runs.
# See README.md for install order (before MetalLB scheduling) and optional kube-proxy replacement.
#
# Chart: cilium/cilium — pin version in helm command (e.g. 1.16.6).
# Ref: https://www.talos.dev/latest/kubernetes-guides/network/deploying-cilium/
ipam:
mode: kubernetes
kubeProxyReplacement: "false"
# Host-network components cannot use kubernetes.default ClusterIP; Talos KubePrism (enabled by default)
# on 127.0.0.1:7445 proxies to healthy apiservers and avoids flaky dials to cluster.controlPlane.endpoint (VIP).
# Ref: https://www.talos.dev/latest/kubernetes-guides/configuration/kubeprism/
k8sServiceHost: "127.0.0.1"
k8sServicePort: "7445"
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
cgroup:
autoMount:
enabled: false
hostRoot: /sys/fs/cgroup
# Workaround: Talos host DNS forwarding + bpf masquerade can break CoreDNS; see Talos Cilium guide "Known issues".
bpf:
masquerade: false