home-server/docs/ansible-getting-started.md

# Ansible getting started — Proxmox → Talos → cluster → Argo CD

This guide walks through the **intended order** for this repository: prepare **Proxmox VE** hosts and optionally form a **Proxmox cluster**, bring up **Talos** nodes and the **Kubernetes** control plane, install the **platform stack** with Ansible, then hand ongoing **bootstrap** configuration to **Argo CD** when you are ready.

Shorter reference tables and variable lists live in [`ansible/README.md`](../ansible/README.md). Deep operational detail for Talos and the noble lab checklist are in [`talos/README.md`](../talos/README.md) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md). Argo-specific sequencing is in [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md).

---

## What runs where

| Layer | What automates it | Where it runs |
|--------|-------------------|----------------|
| Proxmox hosts (repos, keys, upgrades, `pvecm`) | `proxmox_*.yml` playbooks | SSH to `proxmox_hosts` in `ansible/inventory/proxmox.yml` |
| Talos machine config, bootstrap, admin kubeconfig | `playbooks/talos_phase_a.yml` (or `deploy.yml` first half) | **Localhost** — needs LAN/VPN to node IPs (`:50000`, later `:6443`) |
| CNI, storage, ingress, cert-manager, Argo CD install, observability, policy, … | `playbooks/noble.yml` (or `deploy.yml` second half) | **Localhost** — uses `kubectl` / Helm against `KUBECONFIG` |
| Post-install reminders | `playbooks/post_deploy.yml` | Localhost |

Default Ansible inventory for Talos/noble playbooks is [`ansible/inventory/localhost.yml`](../ansible/inventory/localhost.yml) (`ansible.cfg` points there). **Proxmox** playbooks use **`-i inventory/proxmox.yml`** explicitly.

---

## Prerequisites (all phases)

On the machine that runs Ansible (your workstation or a bastion):

- **Ansible** (version compatible with the playbooks in this repo).
- **SSH** access to Proxmox hosts when running Proxmox playbooks.
- For Talos and Kubernetes phases: **same L2/L3 path** to lab node IPs (and eventually the API VIP) as documented in [`talos/README.md`](../talos/README.md) §3.
- **Talos tooling:** `talosctl` (version aligned with the node image), **`talhelper`**, **`kubectl`**, **`helm`**.

Optional but common for this repo:

- **SOPS** + **age** if you use encrypted manifests under `clusters/noble/secrets/` (see `clusters/noble/secrets/README.md`).
- Repository root **`.env`** copied from [`.env.sample`](../.env.sample) for cert-manager (Cloudflare DNS-01) and other optional components.

---

## 1. Proxmox — hosts and VE cluster

These steps are **independent** of Talos and Kubernetes. They configure community repositories, routine upgrades, SSH keys for `root`, and optionally create a **Proxmox VE cluster** (`pvecm`).

### 1.1 Inventory and variables

1. Copy the example inventory:

   ```bash
   cp ansible/inventory/proxmox.example.yml ansible/inventory/proxmox.yml
   ```

2. Edit `ansible/inventory/proxmox.yml`: set `ansible_host`, `ansible_user` (typically `root`), and for the first login without key auth either **`--ask-pass`** or `ansible_password` (prefer **Ansible Vault** for passwords).

3. Edit [`ansible/inventory/group_vars/proxmox_hosts.yml`](../ansible/inventory/group_vars/proxmox_hosts.yml):

   - **`proxmox_cluster_name`** — name for a **new** Proxmox cluster (changing it later does not rename an existing cluster).
   - **`proxmox_cluster_master`** — inventory host name of the first node that runs `pvecm create`; leave empty only if the default ordering matches your intent (see role defaults).
   - **`proxmox_root_authorized_key_files`** — public keys installed for `root` (after prepare, password login is usually unnecessary).
   - **`proxmox_cluster_master_root_password`** — only if `pvecm add` still needs the master’s root password for joins; store with Vault in real environments.

Repo variables for Debian codename and subscription notices are already set in that file; adjust **`proxmox_repo_debian_codename`** if your PVE major tracks a different Debian base.

### 1.2 Run order

From the **`ansible/`** directory, targeting Proxmox:

```bash
cd ansible

# First contact: often need --ask-pass until SSH keys are installed
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass

ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml
```

**`proxmox_prepare.yml`** runs the `proxmox_baseline` role (community repos, suppress no-subscription UI nag, install your **`authorized_keys`** for root). **`proxmox_upgrade.yml`** runs maintenance (dist-upgrade, cleanup, reboot when required) **serially** one host at a time. **`proxmox_cluster.yml`** bootstraps or joins the Proxmox cluster **serially**.

Convenience wrapper — same three steps in order:

```bash
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
```

After this phase you should have stable **Proxmox** hosts (and optionally a single **Proxmox cluster**) for creating the Talos VMs or bare-metal install targets. Creating those VMs or ISO boot entries is **outside** these playbooks; align disks and networks with [`talos/talconfig.yaml`](../talos/talconfig.yaml) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) inventory.

---

## 2. Talos — secrets, generated configs, and Phase A

Talos automation assumes **`talos/talconfig.yaml`** (and secrets) describe your nodes. Ansible **does not** replace reading [`talos/README.md`](../talos/README.md): order matters (**genconfig → apply all nodes → bootstrap**), and **`--insecure`** is only for maintenance-mode APIs.

### 2.1 One-time (or when rotating machine config)

From the **`talos/`** directory:

```bash
cd talos
talhelper gensecret > talsecret.yaml
# talhelper validate talconfig talconfig.yaml   # after edits
```

Do **not** commit `talsecret.yaml`, `talos/out/`, or `talos/kubeconfig`. If you use SOPS for secrets, follow talhelper and repo docs for encrypted variants.

### 2.2 Automated Phase A (recommended)

[`ansible/playbooks/talos_phase_a.yml`](../ansible/playbooks/talos_phase_a.yml) runs the **`talos_phase_a`** role on **localhost**:

1. **`talhelper genconfig -o out`** (when `noble_talos_genconfig` is true).
2. **`talosctl apply-config`** for each entry in **`noble_talos_nodes`** (maintenance vs secure mode is auto-probed unless you force `noble_talos_apply_mode`).
3. **`talosctl bootstrap`** on the bootstrap node (unless `noble_talos_skip_bootstrap` is true or etcd is already initialized).
4. **`talosctl kubeconfig`** writing **`talos/kubeconfig`** at the repo root (path set in the playbook).

Run from **`ansible/`**:

```bash
cd ansible
ansible-playbook playbooks/talos_phase_a.yml
```

Override IPs, machine filenames, or timing when your lab differs from [`ansible/roles/talos_phase_a/defaults/main.yml`](../ansible/roles/talos_phase_a/defaults/main.yml), for example:

```bash
ansible-playbook playbooks/talos_phase_a.yml \
  -e 'noble_talos_bootstrap_node_ip=192.168.50.20' \
  -e 'noble_talos_kubeconfig_endpoint=192.168.50.20'
```

If etcd is already bootstrapped and you only need apply/kubeconfig:

```bash
ansible-playbook playbooks/talos_phase_a.yml -e 'noble_talos_skip_bootstrap=true'
```

**Legacy:** `playbooks/talos_bootstrap.yml` only runs genconfig via `talos_bootstrap`; prefer **`talos_phase_a.yml`** for a full bring-up.

### 2.3 Sanity checks before the platform playbook

- **`talos/kubeconfig`** exists (or export **`KUBECONFIG`** to your own path).
- From the same network path you will use for Helm: **`kubectl get --raw /healthz`** returns **`ok`** (see [`talos/README.md`](../talos/README.md) §3 if the kubeconfig points at a VIP you cannot reach — use `noble_k8s_api_server_override` on **`noble.yml`** as in [`ansible/inventory/group_vars/all.yml`](../ansible/inventory/group_vars/all.yml)).

---

## 3. Kubernetes cluster creation — platform install (`noble.yml`)

Here “cluster creation” means: **empty Talos nodes are now members of a Kubernetes cluster**, and you are installing **CNI, storage, load balancing, ingress, cert-manager, GitOps, observability, policy**, and related components from this repo’s Helm/kubectl roles.

[`ansible/playbooks/noble.yml`](../ansible/playbooks/noble.yml) is the main playbook. It sets **`KUBECONFIG`** from the environment or defaults to **`$REPO_ROOT/talos/kubeconfig`**, runs an API **`/healthz`** preflight (with optional VIP fallback), then applies roles in dependency order (for example **Cilium** before **MetalLB** / **kube-vip**, **Kyverno** before **Longhorn** as documented in the playbook comments).

### 3.1 Secrets and feature flags

- **`.env`** at repo root: at minimum **`CLOUDFLARE_DNS_API_TOKEN`** when [`noble_cert_manager_require_cloudflare_secret`](../ansible/inventory/group_vars/all.yml) is true, so cert-manager can create DNS-01 issuers.
- **[`ansible/inventory/group_vars/all.yml`](../ansible/inventory/group_vars/all.yml)** toggles optional components (`noble_newt_install`, `noble_velero_install`, `noble_authentik_install`, Argo root Application flags, API server override, etc.).

### 3.2 Run

```bash
cd ansible
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig   # optional if default path is correct

ansible-playbook playbooks/noble.yml
```

If the kubeconfig targets the API VIP but this host can only reach a control-plane IP:

```bash
ansible-playbook playbooks/noble.yml -e 'noble_k8s_api_server_override=https://192.168.50.20:6443'
```

Partial runs use **tags** (see [`ansible/README.md`](../ansible/README.md)).

### 3.3 One-shot pipeline from Talos through platform

[`ansible/playbooks/deploy.yml`](../ansible/playbooks/deploy.yml) imports **`talos_phase_a.yml`** then **`noble.yml`**. Use it when nodes are booted and reachable and you want a single command after updating `talconfig`:

```bash
cd ansible
ansible-playbook playbooks/deploy.yml
```

---

## 4. Cutover to Argo CD for deployment and config

Important mental model from [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md) and [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md):

- **Core platform** (CNI, storage, ingress, cert-manager, observability stack, Kyverno, etc.) is installed by **`noble.yml`** from **`clusters/noble/bootstrap/`** via Helm and kubectl — **Argo CD does not reconcile those core Helm charts by default** (those leaves live under **`argocd/app-of-apps/`** and are applied after Ansible Helm).
- **`noble-bootstrap-root`** tracks **`clusters/noble/bootstrap/`** (which **kustomize-includes** **`clusters/noble/apps/`**) for GitOps alignment with bootstrap kustomize and optional add-on **`Application`** manifests — enable **automated** sync only **after** Ansible has finished so Argo does not fight Helm mid-play.

### 4.1 What Ansible already does for Argo

At the **end** of **`noble.yml`**, after all Ansible Helm roles (**`noble_platform`**, **`noble_authentik`**, **`noble_velero`** when enabled), the play runs **`noble_argocd`** task file **`applications_post_platform.yml`**, which applies:

- **`bootstrap-root-application.yaml`** and **`kubectl apply -k clusters/noble/bootstrap/argocd/app-of-apps`** when **`noble_argocd_apply_bootstrap_root_application`** is true.

So the **bootstrap root Application CR** and **leaf Application** registrations typically already exist on the cluster after a successful **`noble.yml`**. They are created **last** on purpose so `argocd-application-controller` does not adopt resources before Helm installs them.

### 4.2 Before you enable GitOps automation

1. **Edit Git URLs** in **`bootstrap-root-application.yaml`**: set **`repoURL`** and **`targetRevision`** to your real remote and branch.
2. **Register the repository** in Argo CD (UI, `argocd repo add`, or a repository `Secret`) if it is private.
3. Leave **`noble-bootstrap-root`** on **manual** sync until Helm and the cluster match git (see **§5** in [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md)).

### 4.3 Enable automated sync for `noble-bootstrap-root`

After **`noble.yml`** completes successfully and you have refreshed the app in Argo, enable automated sync (prune + self-heal) using one of the methods documented in **§5** of [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md), for example:

```bash
kubectl patch application noble-bootstrap-root -n argocd --type merge \
  -p '{"spec":{"syncPolicy":{"automated":{"prune":true,"selfHeal":true},"syncOptions":["CreateNamespace=true"]}}}'
```

**Leaf** `Application` objects under **`clusters/noble/bootstrap/argocd/app-of-apps/`** remain **manual** until you intentionally turn on auto-sync **per chart** — when Argo should own a release, enable that leaf and **remove** the corresponding **`helm upgrade`** from Ansible so a single controller owns the release.

### 4.4 Optional apps repo path

Add only **additive** workloads under **`clusters/noble/apps/`** as `Application` manifests (see [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md)). **`kustomization.yaml`** may start empty; that is expected.

### 4.5 Post-deploy reminders

```bash
cd ansible
ansible-playbook playbooks/post_deploy.yml
```

This prints guidance about **SOPS** key handling and points back to the Argo README for sync policy.

### 4.6 Migrating from older Argo Application names

If you previously used different Application objects (for example a monolithic `noble-platform`), delete stale Applications as described in [`ansible/README.md`](../ansible/README.md) under **Migrating from Argo-managed `noble-platform`**, then re-apply the root manifests and reconcile with **`noble.yml`** if Helm drifted.

---

## Quick reference — minimal command sequence

```bash
# 1) Proxmox (from ansible/, with proxmox inventory)
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml

# 2–3) Talos + Kubernetes platform (localhost inventory default)
cd ansible
ansible-playbook playbooks/deploy.yml

# 4) Reminders
ansible-playbook playbooks/post_deploy.yml
```

Then finish **Argo** cutover in the UI or CLI: register repo → refresh **`noble-bootstrap-root`** → enable **AUTO-SYNC** when ready → selectively enable leaf apps and retire overlapping Ansible Helm tasks.

---

## Related documentation

| Topic | Path |
|--------|------|
| Ansible overview | [`ansible/README.md`](../ansible/README.md) |
| Talos quick start | [`talos/README.md`](../talos/README.md) |
| Noble lab checklist | [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) |
| Argo bootstrap and sync policy | [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md) |
| Optional Argo apps dir | [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md) |
| Deploy secrets | [`.env.sample`](../.env.sample) |