Files
home-server/ansible/README.md

5.4 KiB

Ansible — noble cluster

Automates talos/CLUSTER-BUILD.md: optional Talos Phase A (genconfig → apply → bootstrap → kubeconfig), then Phase B+ (CNI → add-ons → ingress → Argo CD → Kyverno → observability, etc.). Argo CD does not reconcile core charts — optional GitOps starts from an empty clusters/noble/bootstrap/argocd/apps/kustomization.yaml.

Order of operations

  1. From talos/: talhelper gensecret / talsecret as in talos/README.md §1 (if not already done).
  2. Talos Phase A (automated): run playbooks/talos_phase_a.yml or the full pipeline playbooks/deploy.yml. This runs talhelper genconfig -o out, talosctl apply-config on each node, talosctl bootstrap, and talosctl kubeconfigtalos/kubeconfig.
  3. Platform stack: playbooks/noble.yml (included at the end of deploy.yml).

Your workstation must be able to reach node IPs on the lab LAN (Talos API :50000 for talosctl, Kubernetes :6443 for kubectl / Helm). If kubectl cannot reach the VIP (192.168.50.230), use -e 'noble_k8s_api_server_override=https://<control-plane-ip>:6443' on noble.yml (see group_vars/all.yml).

One-shot full deploy (after nodes are booted and reachable):

cd ansible
ansible-playbook playbooks/deploy.yml

Deploy secrets (.env)

Copy .env.sample to .env at the repository root (.env is gitignored). At minimum set CLOUDFLARE_DNS_API_TOKEN for cert-manager DNS-01. The cert-manager role applies it automatically during noble.yml. See .env.sample for optional placeholders (e.g. Newt/Pangolin).

Prerequisites

  • talosctl (matches node Talos version), talhelper, helm, kubectl.
  • Phase A: same LAN/VPN as nodes so Talos :50000 and Kubernetes :6443 are reachable (see talos/README.md §3).
  • noble.yml: bootstrapped cluster and talos/kubeconfig (or KUBECONFIG).

Playbooks

Playbook Purpose
playbooks/deploy.yml Talos Phase A then noble.yml (full automation).
playbooks/talos_phase_a.yml genconfigapply-configbootstrapkubeconfig only.
playbooks/noble.yml Helm + kubectl platform (after Phase A).
playbooks/post_deploy.yml Vault / ESO reminders (noble_apply_vault_cluster_secret_store).
playbooks/talos_bootstrap.yml talhelper genconfig only (legacy shortcut; prefer talos_phase_a.yml).
cd ansible
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig

# noble.yml only — if VIP is unreachable from this host:
# ansible-playbook playbooks/noble.yml -e 'noble_k8s_api_server_override=https://192.168.50.20:6443'

ansible-playbook playbooks/noble.yml
ansible-playbook playbooks/post_deploy.yml

Talos Phase A variables (role talos_phase_a defaults)

Override with -e when needed, e.g. -e noble_talos_skip_bootstrap=true if etcd is already initialized.

Variable Default Meaning
noble_talos_genconfig true Run talhelper genconfig -o out first.
noble_talos_apply_mode auto autotalosctl apply-config --dry-run on the first node picks maintenance (--insecure) vs joined (TALOSCONFIG). insecure / secure force talos/README §2 A or B.
noble_talos_skip_bootstrap false Skip talosctl bootstrap. If etcd is already initialized, bootstrap is treated as a no-op (same as talosctl “etcd data directory is not empty”).
noble_talos_apid_wait_delay / noble_talos_apid_wait_timeout 20 / 900 Seconds to wait for apid :50000 on the bootstrap node after apply-config (nodes reboot). Increase if bootstrap hits connection refused to :50000.
noble_talos_nodes neon/argon/krypton/helium IP + out/*.yaml filename — align with talos/talconfig.yaml.

Tags (partial runs)

ansible-playbook playbooks/noble.yml --tags cilium,metallb
ansible-playbook playbooks/noble.yml --skip-tags newt

Variables — group_vars/all.yml

  • noble_newt_install, noble_cert_manager_require_cloudflare_secret, noble_apply_vault_cluster_secret_store, noble_k8s_api_server_override, noble_k8s_api_server_auto_fallback, noble_k8s_api_server_fallback, noble_skip_k8s_health_check.

Roles

Role Contents
talos_phase_a Talos genconfig, apply-config, bootstrap, kubeconfig
helm_repos helm repo add / update
noble_* Cilium, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack
noble_post_deploy Post-install reminders
talos_bootstrap Genconfig-only (used by older playbook)

Migrating from Argo-managed noble-platform

kubectl delete application -n argocd noble-platform noble-kyverno noble-kyverno-policies --ignore-not-found
kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml

Then run playbooks/noble.yml so Helm state matches git values.