Add Ansible getting started guide for Proxmox to Talos deployment process and update README with narrative walkthrough. This enhances documentation clarity and provides a structured approach for users to set up the noble cluster.
This commit is contained in:
@@ -1,5 +1,7 @@
|
|||||||
# Ansible — noble cluster
|
# Ansible — noble cluster
|
||||||
|
|
||||||
|
**Narrative walkthrough (Proxmox → Talos → platform → Argo):** [`docs/ansible-getting-started.md`](../docs/ansible-getting-started.md).
|
||||||
|
|
||||||
Automates [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md): optional **Talos Phase A** (genconfig → apply → bootstrap → kubeconfig), then **Phase B+** (CNI → add-ons → ingress → Argo CD → Kyverno → observability → Trivy, etc.). **Argo CD** does not reconcile core charts — optional GitOps starts from an empty [`clusters/noble/apps/kustomization.yaml`](../clusters/noble/apps/kustomization.yaml).
|
Automates [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md): optional **Talos Phase A** (genconfig → apply → bootstrap → kubeconfig), then **Phase B+** (CNI → add-ons → ingress → Argo CD → Kyverno → observability → Trivy, etc.). **Argo CD** does not reconcile core charts — optional GitOps starts from an empty [`clusters/noble/apps/kustomization.yaml`](../clusters/noble/apps/kustomization.yaml).
|
||||||
|
|
||||||
## Order of operations
|
## Order of operations
|
||||||
|
|||||||
261
docs/ansible-getting-started.md
Normal file
261
docs/ansible-getting-started.md
Normal file
@@ -0,0 +1,261 @@
|
|||||||
|
# Ansible getting started — Proxmox → Talos → cluster → Argo CD
|
||||||
|
|
||||||
|
This guide walks through the **intended order** for this repository: prepare **Proxmox VE** hosts and optionally form a **Proxmox cluster**, bring up **Talos** nodes and the **Kubernetes** control plane, install the **platform stack** with Ansible, then hand ongoing **bootstrap** configuration to **Argo CD** when you are ready.
|
||||||
|
|
||||||
|
Shorter reference tables and variable lists live in [`ansible/README.md`](../ansible/README.md). Deep operational detail for Talos and the noble lab checklist are in [`talos/README.md`](../talos/README.md) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md). Argo-specific sequencing is in [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What runs where
|
||||||
|
|
||||||
|
| Layer | What automates it | Where it runs |
|
||||||
|
|--------|-------------------|----------------|
|
||||||
|
| Proxmox hosts (repos, keys, upgrades, `pvecm`) | `proxmox_*.yml` playbooks | SSH to `proxmox_hosts` in `ansible/inventory/proxmox.yml` |
|
||||||
|
| Talos machine config, bootstrap, admin kubeconfig | `playbooks/talos_phase_a.yml` (or `deploy.yml` first half) | **Localhost** — needs LAN/VPN to node IPs (`:50000`, later `:6443`) |
|
||||||
|
| CNI, storage, ingress, cert-manager, Argo CD install, observability, policy, … | `playbooks/noble.yml` (or `deploy.yml` second half) | **Localhost** — uses `kubectl` / Helm against `KUBECONFIG` |
|
||||||
|
| Post-install reminders | `playbooks/post_deploy.yml` | Localhost |
|
||||||
|
|
||||||
|
Default Ansible inventory for Talos/noble playbooks is [`ansible/inventory/localhost.yml`](../ansible/inventory/localhost.yml) (`ansible.cfg` points there). **Proxmox** playbooks use **`-i inventory/proxmox.yml`** explicitly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisites (all phases)
|
||||||
|
|
||||||
|
On the machine that runs Ansible (your workstation or a bastion):
|
||||||
|
|
||||||
|
- **Ansible** (version compatible with the playbooks in this repo).
|
||||||
|
- **SSH** access to Proxmox hosts when running Proxmox playbooks.
|
||||||
|
- For Talos and Kubernetes phases: **same L2/L3 path** to lab node IPs (and eventually the API VIP) as documented in [`talos/README.md`](../talos/README.md) §3.
|
||||||
|
- **Talos tooling:** `talosctl` (version aligned with the node image), **`talhelper`**, **`kubectl`**, **`helm`**.
|
||||||
|
|
||||||
|
Optional but common for this repo:
|
||||||
|
|
||||||
|
- **SOPS** + **age** if you use encrypted manifests under `clusters/noble/secrets/` (see `clusters/noble/secrets/README.md`).
|
||||||
|
- Repository root **`.env`** copied from [`.env.sample`](../.env.sample) for cert-manager (Cloudflare DNS-01) and other optional components.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Proxmox — hosts and VE cluster
|
||||||
|
|
||||||
|
These steps are **independent** of Talos and Kubernetes. They configure community repositories, routine upgrades, SSH keys for `root`, and optionally create a **Proxmox VE cluster** (`pvecm`).
|
||||||
|
|
||||||
|
### 1.1 Inventory and variables
|
||||||
|
|
||||||
|
1. Copy the example inventory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp ansible/inventory/proxmox.example.yml ansible/inventory/proxmox.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Edit `ansible/inventory/proxmox.yml`: set `ansible_host`, `ansible_user` (typically `root`), and for the first login without key auth either **`--ask-pass`** or `ansible_password` (prefer **Ansible Vault** for passwords).
|
||||||
|
|
||||||
|
3. Edit [`ansible/inventory/group_vars/proxmox_hosts.yml`](../ansible/inventory/group_vars/proxmox_hosts.yml):
|
||||||
|
|
||||||
|
- **`proxmox_cluster_name`** — name for a **new** Proxmox cluster (changing it later does not rename an existing cluster).
|
||||||
|
- **`proxmox_cluster_master`** — inventory host name of the first node that runs `pvecm create`; leave empty only if the default ordering matches your intent (see role defaults).
|
||||||
|
- **`proxmox_root_authorized_key_files`** — public keys installed for `root` (after prepare, password login is usually unnecessary).
|
||||||
|
- **`proxmox_cluster_master_root_password`** — only if `pvecm add` still needs the master’s root password for joins; store with Vault in real environments.
|
||||||
|
|
||||||
|
Repo variables for Debian codename and subscription notices are already set in that file; adjust **`proxmox_repo_debian_codename`** if your PVE major tracks a different Debian base.
|
||||||
|
|
||||||
|
### 1.2 Run order
|
||||||
|
|
||||||
|
From the **`ansible/`** directory, targeting Proxmox:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ansible
|
||||||
|
|
||||||
|
# First contact: often need --ask-pass until SSH keys are installed
|
||||||
|
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass
|
||||||
|
|
||||||
|
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
|
||||||
|
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
**`proxmox_prepare.yml`** runs the `proxmox_baseline` role (community repos, suppress no-subscription UI nag, install your **`authorized_keys`** for root). **`proxmox_upgrade.yml`** runs maintenance (dist-upgrade, cleanup, reboot when required) **serially** one host at a time. **`proxmox_cluster.yml`** bootstraps or joins the Proxmox cluster **serially**.
|
||||||
|
|
||||||
|
Convenience wrapper — same three steps in order:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
After this phase you should have stable **Proxmox** hosts (and optionally a single **Proxmox cluster**) for creating the Talos VMs or bare-metal install targets. Creating those VMs or ISO boot entries is **outside** these playbooks; align disks and networks with [`talos/talconfig.yaml`](../talos/talconfig.yaml) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) inventory.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Talos — secrets, generated configs, and Phase A
|
||||||
|
|
||||||
|
Talos automation assumes **`talos/talconfig.yaml`** (and secrets) describe your nodes. Ansible **does not** replace reading [`talos/README.md`](../talos/README.md): order matters (**genconfig → apply all nodes → bootstrap**), and **`--insecure`** is only for maintenance-mode APIs.
|
||||||
|
|
||||||
|
### 2.1 One-time (or when rotating machine config)
|
||||||
|
|
||||||
|
From the **`talos/`** directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd talos
|
||||||
|
talhelper gensecret > talsecret.yaml
|
||||||
|
# talhelper validate talconfig talconfig.yaml # after edits
|
||||||
|
```
|
||||||
|
|
||||||
|
Do **not** commit `talsecret.yaml`, `talos/out/`, or `talos/kubeconfig`. If you use SOPS for secrets, follow talhelper and repo docs for encrypted variants.
|
||||||
|
|
||||||
|
### 2.2 Automated Phase A (recommended)
|
||||||
|
|
||||||
|
[`ansible/playbooks/talos_phase_a.yml`](../ansible/playbooks/talos_phase_a.yml) runs the **`talos_phase_a`** role on **localhost**:
|
||||||
|
|
||||||
|
1. **`talhelper genconfig -o out`** (when `noble_talos_genconfig` is true).
|
||||||
|
2. **`talosctl apply-config`** for each entry in **`noble_talos_nodes`** (maintenance vs secure mode is auto-probed unless you force `noble_talos_apply_mode`).
|
||||||
|
3. **`talosctl bootstrap`** on the bootstrap node (unless `noble_talos_skip_bootstrap` is true or etcd is already initialized).
|
||||||
|
4. **`talosctl kubeconfig`** writing **`talos/kubeconfig`** at the repo root (path set in the playbook).
|
||||||
|
|
||||||
|
Run from **`ansible/`**:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ansible
|
||||||
|
ansible-playbook playbooks/talos_phase_a.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
Override IPs, machine filenames, or timing when your lab differs from [`ansible/roles/talos_phase_a/defaults/main.yml`](../ansible/roles/talos_phase_a/defaults/main.yml), for example:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook playbooks/talos_phase_a.yml \
|
||||||
|
-e 'noble_talos_bootstrap_node_ip=192.168.50.20' \
|
||||||
|
-e 'noble_talos_kubeconfig_endpoint=192.168.50.20'
|
||||||
|
```
|
||||||
|
|
||||||
|
If etcd is already bootstrapped and you only need apply/kubeconfig:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook playbooks/talos_phase_a.yml -e 'noble_talos_skip_bootstrap=true'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Legacy:** `playbooks/talos_bootstrap.yml` only runs genconfig via `talos_bootstrap`; prefer **`talos_phase_a.yml`** for a full bring-up.
|
||||||
|
|
||||||
|
### 2.3 Sanity checks before the platform playbook
|
||||||
|
|
||||||
|
- **`talos/kubeconfig`** exists (or export **`KUBECONFIG`** to your own path).
|
||||||
|
- From the same network path you will use for Helm: **`kubectl get --raw /healthz`** returns **`ok`** (see [`talos/README.md`](../talos/README.md) §3 if the kubeconfig points at a VIP you cannot reach — use `noble_k8s_api_server_override` on **`noble.yml`** as in [`ansible/inventory/group_vars/all.yml`](../ansible/inventory/group_vars/all.yml)).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Kubernetes cluster creation — platform install (`noble.yml`)
|
||||||
|
|
||||||
|
Here “cluster creation” means: **empty Talos nodes are now members of a Kubernetes cluster**, and you are installing **CNI, storage, load balancing, ingress, cert-manager, GitOps, observability, policy**, and related components from this repo’s Helm/kubectl roles.
|
||||||
|
|
||||||
|
[`ansible/playbooks/noble.yml`](../ansible/playbooks/noble.yml) is the main playbook. It sets **`KUBECONFIG`** from the environment or defaults to **`$REPO_ROOT/talos/kubeconfig`**, runs an API **`/healthz`** preflight (with optional VIP fallback), then applies roles in dependency order (for example **Cilium** before **MetalLB** / **kube-vip**, **Kyverno** before **Longhorn** as documented in the playbook comments).
|
||||||
|
|
||||||
|
### 3.1 Secrets and feature flags
|
||||||
|
|
||||||
|
- **`.env`** at repo root: at minimum **`CLOUDFLARE_DNS_API_TOKEN`** when [`noble_cert_manager_require_cloudflare_secret`](../ansible/inventory/group_vars/all.yml) is true, so cert-manager can create DNS-01 issuers.
|
||||||
|
- **[`ansible/inventory/group_vars/all.yml`](../ansible/inventory/group_vars/all.yml)** toggles optional components (`noble_newt_install`, `noble_velero_install`, `noble_authentik_install`, Argo root Application flags, API server override, etc.).
|
||||||
|
|
||||||
|
### 3.2 Run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ansible
|
||||||
|
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig # optional if default path is correct
|
||||||
|
|
||||||
|
ansible-playbook playbooks/noble.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
If the kubeconfig targets the API VIP but this host can only reach a control-plane IP:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ansible-playbook playbooks/noble.yml -e 'noble_k8s_api_server_override=https://192.168.50.20:6443'
|
||||||
|
```
|
||||||
|
|
||||||
|
Partial runs use **tags** (see [`ansible/README.md`](../ansible/README.md)).
|
||||||
|
|
||||||
|
### 3.3 One-shot pipeline from Talos through platform
|
||||||
|
|
||||||
|
[`ansible/playbooks/deploy.yml`](../ansible/playbooks/deploy.yml) imports **`talos_phase_a.yml`** then **`noble.yml`**. Use it when nodes are booted and reachable and you want a single command after updating `talconfig`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ansible
|
||||||
|
ansible-playbook playbooks/deploy.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Cutover to Argo CD for deployment and config
|
||||||
|
|
||||||
|
Important mental model from [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md) and [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md):
|
||||||
|
|
||||||
|
- **Core platform** (CNI, storage, ingress, cert-manager, observability stack, Kyverno, etc.) is installed by **`noble.yml`** from **`clusters/noble/bootstrap/`** via Helm and kubectl — **Argo CD does not reconcile those charts by default** in the “empty apps folder” layout.
|
||||||
|
- **`noble-root`** tracks **`clusters/noble/apps/`** for **optional** add-on `Application` manifests.
|
||||||
|
- **`noble-bootstrap-root`** tracks **`clusters/noble/bootstrap/`** for GitOps alignment with bootstrap kustomize — enable **automated** sync only **after** Ansible has finished so Argo does not fight Helm mid-play.
|
||||||
|
|
||||||
|
### 4.1 What Ansible already does for Argo
|
||||||
|
|
||||||
|
At the **end** of **`noble.yml`**, after all Helm roles (including **`noble_platform`**, **`noble_authentik`**, **`noble_trivy`**, **`noble_velero`**), the play runs **`noble_argocd`** task file **`applications_post_platform.yml`**, which applies:
|
||||||
|
|
||||||
|
- **`clusters/noble/bootstrap/argocd/root-application.yaml`** when **`noble_argocd_apply_root_application`** is true.
|
||||||
|
- **`bootstrap-root-application.yaml`** and **`kubectl apply -k clusters/noble/bootstrap/argocd/app-of-apps`** when **`noble_argocd_apply_bootstrap_root_application`** is true.
|
||||||
|
|
||||||
|
So the **root Application CRs** and **leaf Application** registrations typically already exist on the cluster after a successful **`noble.yml`**. They are created **last** on purpose so `argocd-application-controller` does not adopt resources before Helm installs them.
|
||||||
|
|
||||||
|
### 4.2 Before you enable GitOps automation
|
||||||
|
|
||||||
|
1. **Edit Git URLs** in **`root-application.yaml`** and **`bootstrap-root-application.yaml`**: set **`repoURL`** and **`targetRevision`** to your real remote and branch.
|
||||||
|
2. **Register the repository** in Argo CD (UI, `argocd repo add`, or a repository `Secret`) if it is private.
|
||||||
|
3. Leave **`noble-bootstrap-root`** on **manual** sync until Helm and the cluster match git (see **§5** in [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md)).
|
||||||
|
|
||||||
|
### 4.3 Enable automated sync for `noble-bootstrap-root`
|
||||||
|
|
||||||
|
After **`noble.yml`** completes successfully and you have refreshed the app in Argo, enable automated sync (prune + self-heal) using one of the methods documented in **§5** of [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md), for example:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl patch application noble-bootstrap-root -n argocd --type merge \
|
||||||
|
-p '{"spec":{"syncPolicy":{"automated":{"prune":true,"selfHeal":true},"syncOptions":["CreateNamespace=true"]}}}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Leaf** `Application` objects under **`clusters/noble/bootstrap/argocd/app-of-apps/`** remain **manual** until you intentionally turn on auto-sync **per chart** — when Argo should own a release, enable that leaf and **remove** the corresponding **`helm upgrade`** from Ansible so a single controller owns the release.
|
||||||
|
|
||||||
|
### 4.4 Optional apps repo path
|
||||||
|
|
||||||
|
Add only **additive** workloads under **`clusters/noble/apps/`** as `Application` manifests (see [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md)). **`kustomization.yaml`** may start empty; that is expected.
|
||||||
|
|
||||||
|
### 4.5 Post-deploy reminders
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ansible
|
||||||
|
ansible-playbook playbooks/post_deploy.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
This prints guidance about **SOPS** key handling and points back to the Argo README for sync policy.
|
||||||
|
|
||||||
|
### 4.6 Migrating from older Argo Application names
|
||||||
|
|
||||||
|
If you previously used different Application objects (for example a monolithic `noble-platform`), delete stale Applications as described in [`ansible/README.md`](../ansible/README.md) under **Migrating from Argo-managed `noble-platform`**, then re-apply the root manifests and reconcile with **`noble.yml`** if Helm drifted.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick reference — minimal command sequence
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1) Proxmox (from ansible/, with proxmox inventory)
|
||||||
|
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
|
||||||
|
|
||||||
|
# 2–3) Talos + Kubernetes platform (localhost inventory default)
|
||||||
|
cd ansible
|
||||||
|
ansible-playbook playbooks/deploy.yml
|
||||||
|
|
||||||
|
# 4) Reminders
|
||||||
|
ansible-playbook playbooks/post_deploy.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
Then finish **Argo** cutover in the UI or CLI: register repo → refresh **`noble-bootstrap-root`** → enable **AUTO-SYNC** when ready → selectively enable leaf apps and retire overlapping Ansible Helm tasks.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related documentation
|
||||||
|
|
||||||
|
| Topic | Path |
|
||||||
|
|--------|------|
|
||||||
|
| Ansible overview | [`ansible/README.md`](../ansible/README.md) |
|
||||||
|
| Talos quick start | [`talos/README.md`](../talos/README.md) |
|
||||||
|
| Noble lab checklist | [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) |
|
||||||
|
| Argo bootstrap and sync policy | [`clusters/noble/bootstrap/argocd/README.md`](../clusters/noble/bootstrap/argocd/README.md) |
|
||||||
|
| Optional Argo apps dir | [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md) |
|
||||||
|
| Deploy secrets | [`.env.sample`](../.env.sample) |
|
||||||
Reference in New Issue
Block a user