10 KiB
Ansible — noble cluster
Automates talos/CLUSTER-BUILD.md: optional Talos Phase A (genconfig → apply → bootstrap → kubeconfig), then Phase B+ (CNI → add-ons → ingress → Argo CD → Kyverno → observability, etc.). Argo CD does not reconcile core charts — optional GitOps starts from an empty clusters/noble/apps/kustomization.yaml.
Order of operations
- From
talos/:talhelper gensecret/talsecretas intalos/README.md§1 (if not already done). - Talos Phase A (automated): run
playbooks/talos_phase_a.ymlor the full pipelineplaybooks/deploy.yml. This runstalhelper genconfig -o out,talosctl apply-configon each node,talosctl bootstrap, andtalosctl kubeconfig→talos/kubeconfig. - Platform stack:
playbooks/noble.yml(included at the end ofdeploy.yml).
Your workstation must be able to reach node IPs on the lab LAN (Talos API :50000 for talosctl, Kubernetes :6443 for kubectl / Helm). If kubectl cannot reach the VIP (192.168.50.230), use -e 'noble_k8s_api_server_override=https://<control-plane-ip>:6443' on noble.yml (see group_vars/all.yml).
One-shot full deploy (after nodes are booted and reachable):
cd ansible
ansible-playbook playbooks/deploy.yml
Deploy secrets (.env)
Copy .env.sample to .env at the repository root (.env is gitignored). At minimum set CLOUDFLARE_DNS_API_TOKEN for cert-manager DNS-01. The cert-manager role applies it automatically during noble.yml. See .env.sample for optional placeholders (e.g. Newt/Pangolin).
Prerequisites
talosctl(matches node Talos version),talhelper,helm,kubectl.- SOPS secrets:
sopsandageon the control host if you useclusters/noble/secrets/withage-key.txt(seeclusters/noble/secrets/README.md). - Phase A: same LAN/VPN as nodes so Talos :50000 and Kubernetes :6443 are reachable (see
talos/README.md§3). - noble.yml: bootstrapped cluster and
talos/kubeconfig(orKUBECONFIG).
Playbooks
| Playbook | Purpose |
|---|---|
playbooks/deploy.yml |
Talos Phase A then noble.yml (full automation). |
playbooks/talos_phase_a.yml |
genconfig → apply-config → bootstrap → kubeconfig only. |
playbooks/noble.yml |
Helm + kubectl platform (after Phase A). |
playbooks/post_deploy.yml |
SOPS reminders and optional Argo root Application note. |
playbooks/talos_bootstrap.yml |
talhelper genconfig only (legacy shortcut; prefer talos_phase_a.yml). |
playbooks/debian_harden.yml |
Baseline hardening for Debian servers (SSH/sysctl/fail2ban/unattended-upgrades). |
playbooks/debian_maintenance.yml |
Debian maintenance run (apt upgrades, autoremove/autoclean, reboot when required). |
playbooks/debian_rotate_ssh_keys.yml |
Rotate managed users' authorized_keys. |
playbooks/debian_ops.yml |
Convenience pipeline: harden then maintenance for Debian servers. |
playbooks/proxmox_prepare.yml |
Configure Proxmox community repos and disable no-subscription UI warning. |
playbooks/proxmox_upgrade.yml |
Proxmox maintenance run (apt dist-upgrade, cleanup, reboot when required). |
playbooks/proxmox_cluster.yml |
Create a Proxmox cluster on the master and join additional hosts. |
playbooks/proxmox_ops.yml |
Convenience pipeline: prepare, upgrade, then cluster Proxmox hosts. |
cd ansible
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
# noble.yml only — if VIP is unreachable from this host:
# ansible-playbook playbooks/noble.yml -e 'noble_k8s_api_server_override=https://192.168.50.20:6443'
ansible-playbook playbooks/noble.yml
ansible-playbook playbooks/post_deploy.yml
Talos Phase A variables (role talos_phase_a defaults)
Override with -e when needed, e.g. -e noble_talos_skip_bootstrap=true if etcd is already initialized.
| Variable | Default | Meaning |
|---|---|---|
noble_talos_genconfig |
true |
Run talhelper genconfig -o out first. |
noble_talos_apply_mode |
auto |
auto — talosctl apply-config --dry-run on the first node picks maintenance (--insecure) vs joined (TALOSCONFIG). insecure / secure force talos/README §2 A or B. |
noble_talos_skip_bootstrap |
false |
Skip talosctl bootstrap. If etcd is already initialized, bootstrap is treated as a no-op (same as talosctl “etcd data directory is not empty”). |
noble_talos_apid_wait_delay / noble_talos_apid_wait_timeout |
20 / 900 |
Seconds to wait for apid :50000 on the bootstrap node after apply-config (nodes reboot). Increase if bootstrap hits connection refused to :50000. |
noble_talos_nodes |
neon/argon/krypton/helium | IP + out/*.yaml filename — align with talos/talconfig.yaml. |
Tags (partial runs)
ansible-playbook playbooks/noble.yml --tags cilium,metallb
ansible-playbook playbooks/noble.yml --skip-tags newt
ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true -e noble_velero_s3_bucket=... -e noble_velero_s3_url=...
Variables — group_vars/all.yml and role defaults
group_vars/all.yml:noble_newt_install,noble_velero_install,noble_cert_manager_require_cloudflare_secret,noble_argocd_apply_root_application,noble_argocd_apply_bootstrap_root_application,noble_k8s_api_server_override,noble_k8s_api_server_auto_fallback,noble_k8s_api_server_fallback,noble_skip_k8s_health_checkroles/noble_platform/defaults/main.yml:noble_apply_sops_secrets,noble_sops_age_key_file(SOPS secrets underclusters/noble/secrets/)
Roles
| Role | Contents |
|---|---|
talos_phase_a |
Talos genconfig, apply-config, bootstrap, kubeconfig |
helm_repos |
helm repo add / update |
noble_* |
Cilium, CSI Volume Snapshot CRDs + controller, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional) |
noble_landing_urls |
Writes ansible/output/noble-lab-ui-urls.md — URLs, service names, and (optional) Argo/Grafana passwords from Secrets |
noble_post_deploy |
Post-install reminders |
talos_bootstrap |
Genconfig-only (used by older playbook) |
debian_baseline_hardening |
Baseline Debian hardening (SSH policy, sysctl profile, fail2ban, unattended upgrades) |
debian_maintenance |
Routine Debian maintenance tasks (updates, cleanup, reboot-on-required) |
debian_ssh_key_rotation |
Declarative authorized_keys rotation for server users |
proxmox_baseline |
Proxmox repo prep (community repos) and no-subscription warning suppression |
proxmox_maintenance |
Proxmox package maintenance (dist-upgrade, cleanup, reboot-on-required) |
proxmox_cluster |
Proxmox cluster bootstrap/join automation using pvecm |
Debian server ops quick start
These playbooks are separate from the Talos/noble flow and target hosts in debian_servers.
- Copy
inventory/debian.example.ymltoinventory/debian.ymland update hosts/users. - Update
group_vars/debian_servers.ymlwith your allowed SSH users and real public keys. - Run with the Debian inventory:
cd ansible
ansible-playbook -i inventory/debian.yml playbooks/debian_harden.yml
ansible-playbook -i inventory/debian.yml playbooks/debian_rotate_ssh_keys.yml
ansible-playbook -i inventory/debian.yml playbooks/debian_maintenance.yml
Or run the combined maintenance pipeline:
cd ansible
ansible-playbook -i inventory/debian.yml playbooks/debian_ops.yml
Proxmox host + cluster quick start
These playbooks are separate from the Talos/noble flow and target hosts in proxmox_hosts.
- Copy
inventory/proxmox.example.ymltoinventory/proxmox.ymland update hosts/users. - Update
group_vars/proxmox_hosts.ymlwith your cluster name (proxmox_cluster_name), chosen cluster master, and root public key file paths to install. - First run (no SSH keys yet): use
--ask-passor setansible_password(prefer Ansible Vault). Keepansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"in inventory for first-contact hosts. - Run prepare first to install your public keys on each host, then continue:
cd ansible
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml
After proxmox_prepare.yml finishes, SSH key auth should work for root (keys from proxmox_root_authorized_key_files), so --ask-pass is usually no longer needed.
If pvecm add still prompts for the master root password during join, set proxmox_cluster_master_root_password (prefer Vault) to run join non-interactively.
Changing proxmox_cluster_name only affects new cluster creation; it does not rename an already-created cluster.
Or run the full Proxmox pipeline:
cd ansible
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
Migrating from Argo-managed noble-platform
kubectl delete application -n argocd noble-platform noble-kyverno noble-kyverno-policies --ignore-not-found
kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
Then run playbooks/noble.yml so Helm state matches git values.