Files
home-server/ansible/README.md

10 KiB

Ansible — noble cluster

Automates talos/CLUSTER-BUILD.md: optional Talos Phase A (genconfig → apply → bootstrap → kubeconfig), then Phase B+ (CNI → add-ons → ingress → Argo CD → Kyverno → observability, etc.). Argo CD does not reconcile core charts — optional GitOps starts from an empty clusters/noble/apps/kustomization.yaml.

Order of operations

  1. From talos/: talhelper gensecret / talsecret as in talos/README.md §1 (if not already done).
  2. Talos Phase A (automated): run playbooks/talos_phase_a.yml or the full pipeline playbooks/deploy.yml. This runs talhelper genconfig -o out, talosctl apply-config on each node, talosctl bootstrap, and talosctl kubeconfigtalos/kubeconfig.
  3. Platform stack: playbooks/noble.yml (included at the end of deploy.yml).

Your workstation must be able to reach node IPs on the lab LAN (Talos API :50000 for talosctl, Kubernetes :6443 for kubectl / Helm). If kubectl cannot reach the VIP (192.168.50.230), use -e 'noble_k8s_api_server_override=https://<control-plane-ip>:6443' on noble.yml (see group_vars/all.yml).

One-shot full deploy (after nodes are booted and reachable):

cd ansible
ansible-playbook playbooks/deploy.yml

Deploy secrets (.env)

Copy .env.sample to .env at the repository root (.env is gitignored). At minimum set CLOUDFLARE_DNS_API_TOKEN for cert-manager DNS-01. The cert-manager role applies it automatically during noble.yml. See .env.sample for optional placeholders (e.g. Newt/Pangolin).

Prerequisites

  • talosctl (matches node Talos version), talhelper, helm, kubectl.
  • SOPS secrets: sops and age on the control host if you use clusters/noble/secrets/ with age-key.txt (see clusters/noble/secrets/README.md).
  • Phase A: same LAN/VPN as nodes so Talos :50000 and Kubernetes :6443 are reachable (see talos/README.md §3).
  • noble.yml: bootstrapped cluster and talos/kubeconfig (or KUBECONFIG).

Playbooks

Playbook Purpose
playbooks/deploy.yml Talos Phase A then noble.yml (full automation).
playbooks/talos_phase_a.yml genconfigapply-configbootstrapkubeconfig only.
playbooks/noble.yml Helm + kubectl platform (after Phase A).
playbooks/post_deploy.yml SOPS reminders and optional Argo root Application note.
playbooks/talos_bootstrap.yml talhelper genconfig only (legacy shortcut; prefer talos_phase_a.yml).
playbooks/debian_harden.yml Baseline hardening for Debian servers (SSH/sysctl/fail2ban/unattended-upgrades).
playbooks/debian_maintenance.yml Debian maintenance run (apt upgrades, autoremove/autoclean, reboot when required).
playbooks/debian_rotate_ssh_keys.yml Rotate managed users' authorized_keys.
playbooks/debian_ops.yml Convenience pipeline: harden then maintenance for Debian servers.
playbooks/proxmox_prepare.yml Configure Proxmox community repos and disable no-subscription UI warning.
playbooks/proxmox_upgrade.yml Proxmox maintenance run (apt dist-upgrade, cleanup, reboot when required).
playbooks/proxmox_cluster.yml Create a Proxmox cluster on the master and join additional hosts.
playbooks/proxmox_ops.yml Convenience pipeline: prepare, upgrade, then cluster Proxmox hosts.
cd ansible
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig

# noble.yml only — if VIP is unreachable from this host:
# ansible-playbook playbooks/noble.yml -e 'noble_k8s_api_server_override=https://192.168.50.20:6443'

ansible-playbook playbooks/noble.yml
ansible-playbook playbooks/post_deploy.yml

Talos Phase A variables (role talos_phase_a defaults)

Override with -e when needed, e.g. -e noble_talos_skip_bootstrap=true if etcd is already initialized.

Variable Default Meaning
noble_talos_genconfig true Run talhelper genconfig -o out first.
noble_talos_apply_mode auto autotalosctl apply-config --dry-run on the first node picks maintenance (--insecure) vs joined (TALOSCONFIG). insecure / secure force talos/README §2 A or B.
noble_talos_skip_bootstrap false Skip talosctl bootstrap. If etcd is already initialized, bootstrap is treated as a no-op (same as talosctl “etcd data directory is not empty”).
noble_talos_apid_wait_delay / noble_talos_apid_wait_timeout 20 / 900 Seconds to wait for apid :50000 on the bootstrap node after apply-config (nodes reboot). Increase if bootstrap hits connection refused to :50000.
noble_talos_nodes neon/argon/krypton/helium IP + out/*.yaml filename — align with talos/talconfig.yaml.

Tags (partial runs)

ansible-playbook playbooks/noble.yml --tags cilium,metallb
ansible-playbook playbooks/noble.yml --skip-tags newt
ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true -e noble_velero_s3_bucket=... -e noble_velero_s3_url=...

Variables — group_vars/all.yml and role defaults

  • group_vars/all.yml: noble_newt_install, noble_velero_install, noble_cert_manager_require_cloudflare_secret, noble_argocd_apply_root_application, noble_argocd_apply_bootstrap_root_application, noble_k8s_api_server_override, noble_k8s_api_server_auto_fallback, noble_k8s_api_server_fallback, noble_skip_k8s_health_check
  • roles/noble_platform/defaults/main.yml: noble_apply_sops_secrets, noble_sops_age_key_file (SOPS secrets under clusters/noble/secrets/)

Roles

Role Contents
talos_phase_a Talos genconfig, apply-config, bootstrap, kubeconfig
helm_repos helm repo add / update
noble_* Cilium, CSI Volume Snapshot CRDs + controller, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional)
noble_landing_urls Writes ansible/output/noble-lab-ui-urls.md — URLs, service names, and (optional) Argo/Grafana passwords from Secrets
noble_post_deploy Post-install reminders
talos_bootstrap Genconfig-only (used by older playbook)
debian_baseline_hardening Baseline Debian hardening (SSH policy, sysctl profile, fail2ban, unattended upgrades)
debian_maintenance Routine Debian maintenance tasks (updates, cleanup, reboot-on-required)
debian_ssh_key_rotation Declarative authorized_keys rotation for server users
proxmox_baseline Proxmox repo prep (community repos) and no-subscription warning suppression
proxmox_maintenance Proxmox package maintenance (dist-upgrade, cleanup, reboot-on-required)
proxmox_cluster Proxmox cluster bootstrap/join automation using pvecm

Debian server ops quick start

These playbooks are separate from the Talos/noble flow and target hosts in debian_servers.

  1. Copy inventory/debian.example.yml to inventory/debian.yml and update hosts/users.
  2. Update group_vars/debian_servers.yml with your allowed SSH users and real public keys.
  3. Run with the Debian inventory:
cd ansible
ansible-playbook -i inventory/debian.yml playbooks/debian_harden.yml
ansible-playbook -i inventory/debian.yml playbooks/debian_rotate_ssh_keys.yml
ansible-playbook -i inventory/debian.yml playbooks/debian_maintenance.yml

Or run the combined maintenance pipeline:

cd ansible
ansible-playbook -i inventory/debian.yml playbooks/debian_ops.yml

Proxmox host + cluster quick start

These playbooks are separate from the Talos/noble flow and target hosts in proxmox_hosts.

  1. Copy inventory/proxmox.example.yml to inventory/proxmox.yml and update hosts/users.
  2. Update group_vars/proxmox_hosts.yml with your cluster name (proxmox_cluster_name), chosen cluster master, and root public key file paths to install.
  3. First run (no SSH keys yet): use --ask-pass or set ansible_password (prefer Ansible Vault). Keep ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new" in inventory for first-contact hosts.
  4. Run prepare first to install your public keys on each host, then continue:
cd ansible
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml

After proxmox_prepare.yml finishes, SSH key auth should work for root (keys from proxmox_root_authorized_key_files), so --ask-pass is usually no longer needed.

If pvecm add still prompts for the master root password during join, set proxmox_cluster_master_root_password (prefer Vault) to run join non-interactively.

Changing proxmox_cluster_name only affects new cluster creation; it does not rename an already-created cluster.

Or run the full Proxmox pipeline:

cd ansible
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml

Migrating from Argo-managed noble-platform

kubectl delete application -n argocd noble-platform noble-kyverno noble-kyverno-policies --ignore-not-found
kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml

Then run playbooks/noble.yml so Helm state matches git values.