Files
home-server/docs/ansible-getting-started.md

260 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ansible getting started — Proxmox → Talos → cluster → Argo CD
This guide walks through the **intended order** for this repository: prepare **Proxmox VE** hosts and optionally form a **Proxmox cluster**, bring up **Talos** nodes and the **Kubernetes** control plane, install the **platform stack** with Ansible, then hand ongoing **bootstrap** configuration to **Argo CD** when you are ready.
Shorter reference tables and variable lists live in [`ansible/README.md`](../ansible/README.md). Deep operational detail for Talos and the noble lab checklist are in [`talos/README.md`](../talos/README.md) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md). Argo-specific sequencing is in [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md).
---
## What runs where
| Layer | What automates it | Where it runs |
|--------|-------------------|----------------|
| Proxmox hosts (repos, keys, upgrades, `pvecm`) | `proxmox_*.yml` playbooks | SSH to `proxmox_hosts` in `ansible/inventory/proxmox.yml` |
| Talos machine config, bootstrap, admin kubeconfig | `playbooks/talos_phase_a.yml` (or `deploy.yml` first half) | **Localhost** — needs LAN/VPN to node IPs (`:50000`, later `:6443`) |
| CNI, storage, ingress, cert-manager, Argo CD install, observability, policy, … | `playbooks/noble.yml` (or `deploy.yml` second half) | **Localhost** — uses `kubectl` / Helm against `KUBECONFIG` |
| Post-install reminders | `playbooks/post_deploy.yml` | Localhost |
Default Ansible inventory for Talos/noble playbooks is [`ansible/inventory/localhost.yml`](../ansible/inventory/localhost.yml) (`ansible.cfg` points there). **Proxmox** playbooks use **`-i inventory/proxmox.yml`** explicitly.
---
## Prerequisites (all phases)
On the machine that runs Ansible (your workstation or a bastion):
- **Ansible** (version compatible with the playbooks in this repo).
- **SSH** access to Proxmox hosts when running Proxmox playbooks.
- For Talos and Kubernetes phases: **same L2/L3 path** to lab node IPs (and eventually the API VIP) as documented in [`talos/README.md`](../talos/README.md) §3.
- **Talos tooling:** `talosctl` (version aligned with the node image), **`talhelper`**, **`kubectl`**, **`helm`**.
Optional but common for this repo:
- **SOPS** + **age** if you use encrypted manifests under `clusters/noble/secrets/` (see `clusters/noble/secrets/README.md`).
- Repository root **`.env`** copied from [`.env.sample`](../.env.sample) for cert-manager (Cloudflare DNS-01) and other optional components.
---
## 1. Proxmox — hosts and VE cluster
These steps are **independent** of Talos and Kubernetes. They configure community repositories, routine upgrades, SSH keys for `root`, and optionally create a **Proxmox VE cluster** (`pvecm`).
### 1.1 Inventory and variables
1. Copy the example inventory:
```bash
cp ansible/inventory/proxmox.example.yml ansible/inventory/proxmox.yml
```
2. Edit `ansible/inventory/proxmox.yml`: set `ansible_host`, `ansible_user` (typically `root`), and for the first login without key auth either **`--ask-pass`** or `ansible_password` (prefer **Ansible Vault** for passwords).
3. Edit [`ansible/inventory/group_vars/proxmox_hosts.yml`](../ansible/inventory/group_vars/proxmox_hosts.yml):
- **`proxmox_cluster_name`** — name for a **new** Proxmox cluster (changing it later does not rename an existing cluster).
- **`proxmox_cluster_master`** — inventory host name of the first node that runs `pvecm create`; leave empty only if the default ordering matches your intent (see role defaults).
- **`proxmox_root_authorized_key_files`** — public keys installed for `root` (after prepare, password login is usually unnecessary).
- **`proxmox_cluster_master_root_password`** — only if `pvecm add` still needs the masters root password for joins; store with Vault in real environments.
Repo variables for Debian codename and subscription notices are already set in that file; adjust **`proxmox_repo_debian_codename`** if your PVE major tracks a different Debian base.
### 1.2 Run order
From the **`ansible/`** directory, targeting Proxmox:
```bash
cd ansible
# First contact: often need --ask-pass until SSH keys are installed
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml
```
**`proxmox_prepare.yml`** runs the `proxmox_baseline` role (community repos, suppress no-subscription UI nag, install your **`authorized_keys`** for root). **`proxmox_upgrade.yml`** runs maintenance (dist-upgrade, cleanup, reboot when required) **serially** one host at a time. **`proxmox_cluster.yml`** bootstraps or joins the Proxmox cluster **serially**.
Convenience wrapper — same three steps in order:
```bash
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
```
After this phase you should have stable **Proxmox** hosts (and optionally a single **Proxmox cluster**) for creating the Talos VMs or bare-metal install targets. Creating those VMs or ISO boot entries is **outside** these playbooks; align disks and networks with [`talos/talconfig.yaml`](../talos/talconfig.yaml) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) inventory.
---
## 2. Talos — secrets, generated configs, and Phase A
Talos automation assumes **`talos/talconfig.yaml`** (and secrets) describe your nodes. Ansible **does not** replace reading [`talos/README.md`](../talos/README.md): order matters (**genconfig → apply all nodes → bootstrap**), and **`--insecure`** is only for maintenance-mode APIs.
### 2.1 One-time (or when rotating machine config)
From the **`talos/`** directory:
```bash
cd talos
talhelper gensecret > talsecret.yaml
# talhelper validate talconfig talconfig.yaml # after edits
```
Do **not** commit `talsecret.yaml`, `talos/out/`, or `talos/kubeconfig`. If you use SOPS for secrets, follow talhelper and repo docs for encrypted variants.
### 2.2 Automated Phase A (recommended)
[`ansible/playbooks/talos_phase_a.yml`](../ansible/playbooks/talos_phase_a.yml) runs the **`talos_phase_a`** role on **localhost**:
1. **`talhelper genconfig -o out`** (when `noble_talos_genconfig` is true).
2. **`talosctl apply-config`** for each entry in **`noble_talos_nodes`** (maintenance vs secure mode is auto-probed unless you force `noble_talos_apply_mode`).
3. **`talosctl bootstrap`** on the bootstrap node (unless `noble_talos_skip_bootstrap` is true or etcd is already initialized).
4. **`talosctl kubeconfig`** writing **`talos/kubeconfig`** at the repo root (path set in the playbook).
Run from **`ansible/`**:
```bash
cd ansible
ansible-playbook playbooks/talos_phase_a.yml
```
Override IPs, machine filenames, or timing when your lab differs from [`ansible/roles/talos_phase_a/defaults/main.yml`](../ansible/roles/talos_phase_a/defaults/main.yml), for example:
```bash
ansible-playbook playbooks/talos_phase_a.yml \
-e 'noble_talos_bootstrap_node_ip=192.168.50.20' \
-e 'noble_talos_kubeconfig_endpoint=192.168.50.20'
```
If etcd is already bootstrapped and you only need apply/kubeconfig:
```bash
ansible-playbook playbooks/talos_phase_a.yml -e 'noble_talos_skip_bootstrap=true'
```
**Legacy:** `playbooks/talos_bootstrap.yml` only runs genconfig via `talos_bootstrap`; prefer **`talos_phase_a.yml`** for a full bring-up.
### 2.3 Sanity checks before the platform playbook
- **`talos/kubeconfig`** exists (or export **`KUBECONFIG`** to your own path).
- From the same network path you will use for Helm: **`kubectl get --raw /healthz`** returns **`ok`** (see [`talos/README.md`](../talos/README.md) §3 if the kubeconfig points at a VIP you cannot reach — use `noble_k8s_api_server_override` on **`noble.yml`** as in [`ansible/inventory/group_vars/all.yml`](../ansible/inventory/group_vars/all.yml)).
---
## 3. Kubernetes cluster creation — platform install (`noble.yml`)
Here “cluster creation” means: **empty Talos nodes are now members of a Kubernetes cluster**, and you are installing **CNI, storage, load balancing, ingress, cert-manager, GitOps, observability, policy**, and related components from this repos Helm/kubectl roles.
[`ansible/playbooks/noble.yml`](../ansible/playbooks/noble.yml) is the main playbook. It sets **`KUBECONFIG`** from the environment or defaults to **`$REPO_ROOT/talos/kubeconfig`**, runs an API **`/healthz`** preflight (with optional VIP fallback), then applies roles in dependency order (for example **Cilium** before **MetalLB** / **kube-vip**, **Kyverno** before **Longhorn** as documented in the playbook comments).
### 3.1 Secrets and feature flags
- **`.env`** at repo root: at minimum **`CLOUDFLARE_DNS_API_TOKEN`** when [`noble_cert_manager_require_cloudflare_secret`](../ansible/inventory/group_vars/all.yml) is true, so cert-manager can create DNS-01 issuers.
- **[`ansible/inventory/group_vars/all.yml`](../ansible/inventory/group_vars/all.yml)** toggles optional components (`noble_newt_install`, `noble_velero_install`, `noble_authentik_install`, Argo root Application flags, API server override, etc.).
### 3.2 Run
```bash
cd ansible
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig # optional if default path is correct
ansible-playbook playbooks/noble.yml
```
If the kubeconfig targets the API VIP but this host can only reach a control-plane IP:
```bash
ansible-playbook playbooks/noble.yml -e 'noble_k8s_api_server_override=https://192.168.50.20:6443'
```
Partial runs use **tags** (see [`ansible/README.md`](../ansible/README.md)).
### 3.3 One-shot pipeline from Talos through platform
[`ansible/playbooks/deploy.yml`](../ansible/playbooks/deploy.yml) imports **`talos_phase_a.yml`** then **`noble.yml`**. Use it when nodes are booted and reachable and you want a single command after updating `talconfig`:
```bash
cd ansible
ansible-playbook playbooks/deploy.yml
```
---
## 4. Cutover to Argo CD for deployment and config
Important mental model from [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md) and [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md):
- **Core platform** (CNI, storage, ingress, cert-manager, observability stack, Kyverno, etc.) is installed by **`noble.yml`** from **`clusters/noble/bootstrap/`** via Helm and kubectl — **Argo CD does not reconcile those core Helm charts by default** (those leaves live under **`argocd/app-of-apps/`** and are applied after Ansible Helm).
- **`noble-bootstrap-root`** tracks **`clusters/noble/bootstrap/`** (which **kustomize-includes** **`clusters/noble/apps/`**) for GitOps alignment with bootstrap kustomize and optional add-on **`Application`** manifests — enable **automated** sync only **after** Ansible has finished so Argo does not fight Helm mid-play.
### 4.1 What Ansible already does for Argo
At the **end** of **`noble.yml`**, after all Ansible Helm roles (**`noble_platform`**, **`noble_authentik`**, **`noble_velero`** when enabled), the play runs **`noble_argocd`** task file **`applications_post_platform.yml`**, which applies:
- **`bootstrap-root-application.yaml`** and **`kubectl apply -k clusters/noble/bootstrap/argocd/app-of-apps`** when **`noble_argocd_apply_bootstrap_root_application`** is true.
So the **bootstrap root Application CR** and **leaf Application** registrations typically already exist on the cluster after a successful **`noble.yml`**. They are created **last** on purpose so `argocd-application-controller` does not adopt resources before Helm installs them.
### 4.2 Before you enable GitOps automation
1. **Edit Git URLs** in **`bootstrap-root-application.yaml`**: set **`repoURL`** and **`targetRevision`** to your real remote and branch.
2. **Register the repository** in Argo CD (UI, `argocd repo add`, or a repository `Secret`) if it is private.
3. Leave **`noble-bootstrap-root`** on **manual** sync until Helm and the cluster match git (see **§5** in [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md)).
### 4.3 Enable automated sync for `noble-bootstrap-root`
After **`noble.yml`** completes successfully and you have refreshed the app in Argo, enable automated sync (prune + self-heal) using one of the methods documented in **§5** of [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md), for example:
```bash
kubectl patch application noble-bootstrap-root -n argocd --type merge \
-p '{"spec":{"syncPolicy":{"automated":{"prune":true,"selfHeal":true},"syncOptions":["CreateNamespace=true"]}}}'
```
**Leaf** `Application` objects under **`clusters/noble/bootstrap/argocd/app-of-apps/`** remain **manual** until you intentionally turn on auto-sync **per chart** — when Argo should own a release, enable that leaf and **remove** the corresponding **`helm upgrade`** from Ansible so a single controller owns the release.
### 4.4 Optional apps repo path
Add only **additive** workloads under **`clusters/noble/apps/`** as `Application` manifests (see [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md)). **`kustomization.yaml`** may start empty; that is expected.
### 4.5 Post-deploy reminders
```bash
cd ansible
ansible-playbook playbooks/post_deploy.yml
```
This prints guidance about **SOPS** key handling and points back to the Argo README for sync policy.
### 4.6 Migrating from older Argo Application names
If you previously used different Application objects (for example a monolithic `noble-platform`), delete stale Applications as described in [`ansible/README.md`](../ansible/README.md) under **Migrating from Argo-managed `noble-platform`**, then re-apply the root manifests and reconcile with **`noble.yml`** if Helm drifted.
---
## Quick reference — minimal command sequence
```bash
# 1) Proxmox (from ansible/, with proxmox inventory)
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
# 23) Talos + Kubernetes platform (localhost inventory default)
cd ansible
ansible-playbook playbooks/deploy.yml
# 4) Reminders
ansible-playbook playbooks/post_deploy.yml
```
Then finish **Argo** cutover in the UI or CLI: register repo → refresh **`noble-bootstrap-root`** → enable **AUTO-SYNC** when ready → selectively enable leaf apps and retire overlapping Ansible Helm tasks.
---
## Related documentation
| Topic | Path |
|--------|------|
| Ansible overview | [`ansible/README.md`](../ansible/README.md) |
| Talos quick start | [`talos/README.md`](../talos/README.md) |
| Noble lab checklist | [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) |
| Argo bootstrap and sync policy | [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md) |
| Optional Argo apps dir | [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md) |
| Deploy secrets | [`.env.sample`](../.env.sample) |