Files
home-server/docs/ansible-getting-started.md

15 KiB
Raw Blame History

Ansible getting started — Proxmox → Talos → cluster → Argo CD

This guide walks through the intended order for this repository: prepare Proxmox VE hosts and optionally form a Proxmox cluster, bring up Talos nodes and the Kubernetes control plane, install the platform stack with Ansible, then hand ongoing bootstrap configuration to Argo CD when you are ready.

Shorter reference tables and variable lists live in ansible/README.md. Deep operational detail for Talos and the noble lab checklist are in talos/README.md and talos/CLUSTER-BUILD.md. Argo-specific sequencing is in clusters/noble/bootstrap/argocd/README.md.


What runs where

Layer What automates it Where it runs
Proxmox hosts (repos, keys, upgrades, pvecm) proxmox_*.yml playbooks SSH to proxmox_hosts in ansible/inventory/proxmox.yml
Talos machine config, bootstrap, admin kubeconfig playbooks/talos_phase_a.yml (or deploy.yml first half) Localhost — needs LAN/VPN to node IPs (:50000, later :6443)
CNI, storage, ingress, cert-manager, Argo CD install, observability, policy, … playbooks/noble.yml (or deploy.yml second half) Localhost — uses kubectl / Helm against KUBECONFIG
Post-install reminders playbooks/post_deploy.yml Localhost

Default Ansible inventory for Talos/noble playbooks is ansible/inventory/localhost.yml (ansible.cfg points there). Proxmox playbooks use -i inventory/proxmox.yml explicitly.


Prerequisites (all phases)

On the machine that runs Ansible (your workstation or a bastion):

  • Ansible (version compatible with the playbooks in this repo).
  • SSH access to Proxmox hosts when running Proxmox playbooks.
  • For Talos and Kubernetes phases: same L2/L3 path to lab node IPs (and eventually the API VIP) as documented in talos/README.md §3.
  • Talos tooling: talosctl (version aligned with the node image), talhelper, kubectl, helm.

Optional but common for this repo:

  • SOPS + age if you use encrypted manifests under clusters/noble/secrets/ (see clusters/noble/secrets/README.md).
  • Repository root .env copied from .env.sample for cert-manager (Cloudflare DNS-01) and other optional components.

1. Proxmox — hosts and VE cluster

These steps are independent of Talos and Kubernetes. They configure community repositories, routine upgrades, SSH keys for root, and optionally create a Proxmox VE cluster (pvecm).

1.1 Inventory and variables

  1. Copy the example inventory:

    cp ansible/inventory/proxmox.example.yml ansible/inventory/proxmox.yml
    
  2. Edit ansible/inventory/proxmox.yml: set ansible_host, ansible_user (typically root), and for the first login without key auth either --ask-pass or ansible_password (prefer Ansible Vault for passwords).

  3. Edit ansible/inventory/group_vars/proxmox_hosts.yml:

    • proxmox_cluster_name — name for a new Proxmox cluster (changing it later does not rename an existing cluster).
    • proxmox_cluster_master — inventory host name of the first node that runs pvecm create; leave empty only if the default ordering matches your intent (see role defaults).
    • proxmox_root_authorized_key_files — public keys installed for root (after prepare, password login is usually unnecessary).
    • proxmox_cluster_master_root_password — only if pvecm add still needs the masters root password for joins; store with Vault in real environments.

Repo variables for Debian codename and subscription notices are already set in that file; adjust proxmox_repo_debian_codename if your PVE major tracks a different Debian base.

1.2 Run order

From the ansible/ directory, targeting Proxmox:

cd ansible

# First contact: often need --ask-pass until SSH keys are installed
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass

ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml

proxmox_prepare.yml runs the proxmox_baseline role (community repos, suppress no-subscription UI nag, install your authorized_keys for root). proxmox_upgrade.yml runs maintenance (dist-upgrade, cleanup, reboot when required) serially one host at a time. proxmox_cluster.yml bootstraps or joins the Proxmox cluster serially.

Convenience wrapper — same three steps in order:

ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml

After this phase you should have stable Proxmox hosts (and optionally a single Proxmox cluster) for creating the Talos VMs or bare-metal install targets. Creating those VMs or ISO boot entries is outside these playbooks; align disks and networks with talos/talconfig.yaml and talos/CLUSTER-BUILD.md inventory.


2. Talos — secrets, generated configs, and Phase A

Talos automation assumes talos/talconfig.yaml (and secrets) describe your nodes. Ansible does not replace reading talos/README.md: order matters (genconfig → apply all nodes → bootstrap), and --insecure is only for maintenance-mode APIs.

2.1 One-time (or when rotating machine config)

From the talos/ directory:

cd talos
talhelper gensecret > talsecret.yaml
# talhelper validate talconfig talconfig.yaml   # after edits

Do not commit talsecret.yaml, talos/out/, or talos/kubeconfig. If you use SOPS for secrets, follow talhelper and repo docs for encrypted variants.

ansible/playbooks/talos_phase_a.yml runs the talos_phase_a role on localhost:

  1. talhelper genconfig -o out (when noble_talos_genconfig is true).
  2. talosctl apply-config for each entry in noble_talos_nodes (maintenance vs secure mode is auto-probed unless you force noble_talos_apply_mode).
  3. talosctl bootstrap on the bootstrap node (unless noble_talos_skip_bootstrap is true or etcd is already initialized).
  4. talosctl kubeconfig writing talos/kubeconfig at the repo root (path set in the playbook).

Run from ansible/:

cd ansible
ansible-playbook playbooks/talos_phase_a.yml

Override IPs, machine filenames, or timing when your lab differs from ansible/roles/talos_phase_a/defaults/main.yml, for example:

ansible-playbook playbooks/talos_phase_a.yml \
  -e 'noble_talos_bootstrap_node_ip=192.168.50.20' \
  -e 'noble_talos_kubeconfig_endpoint=192.168.50.20'

If etcd is already bootstrapped and you only need apply/kubeconfig:

ansible-playbook playbooks/talos_phase_a.yml -e 'noble_talos_skip_bootstrap=true'

Legacy: playbooks/talos_bootstrap.yml only runs genconfig via talos_bootstrap; prefer talos_phase_a.yml for a full bring-up.

2.3 Sanity checks before the platform playbook

  • talos/kubeconfig exists (or export KUBECONFIG to your own path).
  • From the same network path you will use for Helm: kubectl get --raw /healthz returns ok (see talos/README.md §3 if the kubeconfig points at a VIP you cannot reach — use noble_k8s_api_server_override on noble.yml as in ansible/inventory/group_vars/all.yml).

3. Kubernetes cluster creation — platform install (noble.yml)

Here “cluster creation” means: empty Talos nodes are now members of a Kubernetes cluster, and you are installing CNI, storage, load balancing, ingress, cert-manager, GitOps, observability, policy, and related components from this repos Helm/kubectl roles.

ansible/playbooks/noble.yml is the main playbook. It sets KUBECONFIG from the environment or defaults to $REPO_ROOT/talos/kubeconfig, runs an API /healthz preflight (with optional VIP fallback), then applies roles in dependency order (for example Cilium before MetalLB / kube-vip, Kyverno before Longhorn as documented in the playbook comments).

3.1 Secrets and feature flags

3.2 Run

cd ansible
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig   # optional if default path is correct

ansible-playbook playbooks/noble.yml

If the kubeconfig targets the API VIP but this host can only reach a control-plane IP:

ansible-playbook playbooks/noble.yml -e 'noble_k8s_api_server_override=https://192.168.50.20:6443'

Partial runs use tags (see ansible/README.md).

3.3 One-shot pipeline from Talos through platform

ansible/playbooks/deploy.yml imports talos_phase_a.yml then noble.yml. Use it when nodes are booted and reachable and you want a single command after updating talconfig:

cd ansible
ansible-playbook playbooks/deploy.yml

4. Cutover to Argo CD for deployment and config

Important mental model from clusters/noble/apps/README.md and clusters/noble/bootstrap/argocd/README.md:

  • Core platform (CNI, storage, ingress, cert-manager, observability stack, Kyverno, etc.) is installed by noble.yml from clusters/noble/bootstrap/ via Helm and kubectl — Argo CD does not reconcile those core Helm charts by default (those leaves live under argocd/app-of-apps/ and are applied after Ansible Helm).
  • noble-bootstrap-root tracks clusters/noble/bootstrap/ (which kustomize-includes clusters/noble/apps/) for GitOps alignment with bootstrap kustomize and optional add-on Application manifests — enable automated sync only after Ansible has finished so Argo does not fight Helm mid-play.

4.1 What Ansible already does for Argo

At the end of noble.yml, after all Ansible Helm roles (noble_platform, noble_authentik, noble_velero when enabled), the play runs noble_argocd task file applications_post_platform.yml, which applies:

  • bootstrap-root-application.yaml and kubectl apply -k clusters/noble/bootstrap/argocd/app-of-apps when noble_argocd_apply_bootstrap_root_application is true.

So the bootstrap root Application CR and leaf Application registrations typically already exist on the cluster after a successful noble.yml. They are created last on purpose so argocd-application-controller does not adopt resources before Helm installs them.

4.2 Before you enable GitOps automation

  1. Edit Git URLs in bootstrap-root-application.yaml: set repoURL and targetRevision to your real remote and branch.
  2. Register the repository in Argo CD (UI, argocd repo add, or a repository Secret) if it is private.
  3. Leave noble-bootstrap-root on manual sync until Helm and the cluster match git (see §5 in clusters/noble/bootstrap/argocd/README.md).

4.3 Enable automated sync for noble-bootstrap-root

After noble.yml completes successfully and you have refreshed the app in Argo, enable automated sync (prune + self-heal) using one of the methods documented in §5 of clusters/noble/bootstrap/argocd/README.md, for example:

kubectl patch application noble-bootstrap-root -n argocd --type merge \
  -p '{"spec":{"syncPolicy":{"automated":{"prune":true,"selfHeal":true},"syncOptions":["CreateNamespace=true"]}}}'

Leaf Application objects under clusters/noble/bootstrap/argocd/app-of-apps/ remain manual until you intentionally turn on auto-sync per chart — when Argo should own a release, enable that leaf and remove the corresponding helm upgrade from Ansible so a single controller owns the release.

4.4 Optional apps repo path

Add only additive workloads under clusters/noble/apps/ as Application manifests (see clusters/noble/apps/README.md). kustomization.yaml may start empty; that is expected.

4.5 Post-deploy reminders

cd ansible
ansible-playbook playbooks/post_deploy.yml

This prints guidance about SOPS key handling and points back to the Argo README for sync policy.

4.6 Migrating from older Argo Application names

If you previously used different Application objects (for example a monolithic noble-platform), delete stale Applications as described in ansible/README.md under Migrating from Argo-managed noble-platform, then re-apply the root manifests and reconcile with noble.yml if Helm drifted.


Quick reference — minimal command sequence

# 1) Proxmox (from ansible/, with proxmox inventory)
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml

# 23) Talos + Kubernetes platform (localhost inventory default)
cd ansible
ansible-playbook playbooks/deploy.yml

# 4) Reminders
ansible-playbook playbooks/post_deploy.yml

Then finish Argo cutover in the UI or CLI: register repo → refresh noble-bootstrap-root → enable AUTO-SYNC when ready → selectively enable leaf apps and retire overlapping Ansible Helm tasks.


Topic Path
Ansible overview ansible/README.md
Talos quick start talos/README.md
Noble lab checklist talos/CLUSTER-BUILD.md
Argo bootstrap and sync policy clusters/noble/bootstrap/argocd/README.md
Optional Argo apps dir clusters/noble/apps/README.md
Deploy secrets .env.sample