6.9 KiB
Talos deployment (4 nodes)
This directory contains a talhelper cluster definition for a 4-node Talos
cluster:
- 3 hybrid control-plane/worker nodes:
noble-cp-1..3 - 1 worker-only node:
noble-worker-1 allowSchedulingOnControlPlanes: true- CNI:
none(for Cilium via GitOps)
1) Update values for your environment
Edit talconfig.yaml:
endpoint(Kubernetes API VIP or LB IP)- each node
ipAddress - each node
installDisk(for example/dev/sda,/dev/nvme0n1) talosVersion/kubernetesVersionif desired
2) Generate cluster secrets and machine configs
From this directory:
talhelper gensecret > talsecret.sops.yaml
talhelper genconfig
Generated machine configs are written to clusterconfig/.
3) Apply Talos configs
Apply each node file to the matching node IP from talconfig.yaml:
talosctl apply-config --insecure -n 192.168.50.20 -f clusterconfig/noble-noble-cp-1.yaml
talosctl apply-config --insecure -n 192.168.50.30 -f clusterconfig/noble-noble-cp-2.yaml
talosctl apply-config --insecure -n 192.168.50.40 -f clusterconfig/noble-noble-cp-3.yaml
talosctl apply-config --insecure -n 192.168.50.10 -f clusterconfig/noble-noble-worker-1.yaml
4) Bootstrap the cluster
After all nodes are up (bootstrap once, from any control-plane node):
talosctl bootstrap -n 192.168.50.20 -e 192.168.50.230
talosctl kubeconfig -n 192.168.50.20 -e 192.168.50.230 .
5) Validate
talosctl -n 192.168.50.20 -e 192.168.50.230 health
kubectl get nodes -o wide
6) GitOps-pinned Cilium values
The Cilium settings that worked for this Talos cluster are now persisted in:
clusters/noble/apps/cilium/application.yaml
That Argo CD Application pins chart 1.16.6 and includes the required Helm
values for this environment (API host/port, cgroup settings, IPAM CIDR, and
security capabilities), so future reconciles do not drift back to defaults.
7) Argo CD app-of-apps bootstrap
This repo includes an app-of-apps structure for cluster apps:
- Root app:
clusters/noble/root-application.yaml - Child apps index:
clusters/noble/apps/kustomization.yaml - Argo CD app:
clusters/noble/apps/argocd/application.yaml - Cilium app:
clusters/noble/apps/cilium/application.yaml
Bootstrap once from your workstation:
kubectl apply -k clusters/noble/bootstrap/argocd
kubectl apply -f clusters/noble/root-application.yaml
After this, Argo CD continuously reconciles all applications under
clusters/noble/apps/.
8) kube-vip API VIP (192.168.50.230)
HAProxy has been removed in favor of kube-vip running on control-plane nodes.
Manifests are in:
clusters/noble/apps/kube-vip/application.yamlclusters/noble/apps/kube-vip/vip-rbac.yamlclusters/noble/apps/kube-vip/vip-daemonset.yaml
The DaemonSet advertises 192.168.50.230 in ARP mode and fronts the Kubernetes
API on port 6443.
Apply manually (or let Argo CD sync from root app):
kubectl apply -k clusters/noble/apps/kube-vip
Validate:
kubectl -n kube-system get pods -l app.kubernetes.io/name=kube-vip-ds -o wide
nc -vz 192.168.50.230 6443
9) Argo CD via DNS host (no port)
Argo CD is exposed through a kube-vip managed LoadBalancer Service:
argo.noble.lab.pcenicni.dev
Manifests:
clusters/noble/bootstrap/argocd/argocd-server-lb.yamlclusters/noble/apps/kube-vip/vip-daemonset.yaml(svc_enable: "true")
After syncing manifests, create a Pi-hole DNS A record:
argo.noble.lab.pcenicni.dev->192.168.50.231
10) Longhorn storage and extra disks
Longhorn is deployed from:
clusters/noble/apps/longhorn/application.yaml
Monitoring apps are configured to use storageClassName: longhorn, so you can
persist Prometheus/Alertmanager/Loki data once Longhorn is healthy.
Extra drive layout (this cluster)
Each node uses:
/dev/sda— Talos install disk (installDiskintalconfig.yaml)/dev/sdb— dedicated Longhorn data disk
talconfig.yaml includes a global patch that partitions /dev/sdb and mounts it
at /var/mnt/longhorn, which matches Longhorn defaultDataPath in the Argo
Helm values.
After editing talconfig.yaml, regenerate and apply configs:
cd talos
talhelper genconfig
# apply each node’s YAML from clusterconfig/ with talosctl apply-config
Then reboot each node once so the new disk layout is applied.
talosctl TLS errors (unknown authority, Ed25519 verification failure)
talosctl does not automatically use talos/clusterconfig/talosconfig. If you
omit it, the client falls back to ~/.talos/config, which is usually a
different cluster CA — you then get TLS handshake failures against the noble
nodes.
Always set this in the shell where you run talosctl (use an absolute path
if you change directories):
cd talos
export TALOSCONFIG="$(pwd)/clusterconfig/talosconfig"
export ENDPOINT=192.168.50.230
Sanity check (should print Talos and Kubernetes versions, not TLS errors):
talosctl -e "${ENDPOINT}" -n 192.168.50.20 version
Then use the same shell for apply-config, reboot, and health.
If it still fails after TALOSCONFIG is set, the running cluster was likely
bootstrapped with different secrets than the ones in your current
talsecret.sops.yaml / regenerated clusterconfig/. In that case you need the
original talosconfig that matched the cluster when it was created, or you
must align secrets and cluster state (recovery / rebuild is a larger topic).
Keep talosctl roughly aligned with the node Talos version (for example
v1.12.x clients for v1.12.5 nodes).
Paste tip: run one command per line. Pasting ...cp-3.yaml and
talosctl on the same line breaks the filename and can confuse the shell.
More than one extra disk per node
If you add a third disk later, extend machine.disks in talconfig.yaml (for
example /dev/sdc → /var/mnt/longhorn-disk2) and register that path in
Longhorn as an additional disk for that node.
Recommended:
- use one dedicated filesystem per Longhorn disk path
- avoid using the Talos system disk for heavy Longhorn data
- spread replicas across nodes for resiliency
11) Upgrade Talos to v1.12.x
This repo now pins:
talosVersion: v1.12.5intalconfig.yaml
Regenerate configs
From talos/:
talhelper genconfig
Rolling upgrade order
Upgrade one node at a time, waiting for it to return healthy before moving on.
- Control plane nodes (
noble-cp-1, thennoble-cp-2, thennoble-cp-3) - Worker node (
noble-worker-1)
Example commands (adjust node IP per step):
talosctl --talosconfig ./clusterconfig/talosconfig -n 192.168.50.20 upgrade --image ghcr.io/siderolabs/installer:v1.12.5
talosctl --talosconfig ./clusterconfig/talosconfig -n 192.168.50.20 reboot
talosctl --talosconfig ./clusterconfig/talosconfig -n 192.168.50.20 health
After all nodes are upgraded, verify:
talosctl --talosconfig ./clusterconfig/talosconfig version
kubectl get nodes -o wide