161 lines
9.9 KiB
Markdown
161 lines
9.9 KiB
Markdown
# Ansible — noble cluster
|
|
|
|
Automates [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md): optional **Talos Phase A** (genconfig → apply → bootstrap → kubeconfig), then **Phase B+** (CNI → add-ons → ingress → Argo CD → Kyverno → observability, etc.). **Argo CD** does not reconcile core charts — optional GitOps starts from an empty [`clusters/noble/apps/kustomization.yaml`](../clusters/noble/apps/kustomization.yaml).
|
|
|
|
## Order of operations
|
|
|
|
1. **From `talos/`:** `talhelper gensecret` / `talsecret` as in [`talos/README.md`](../talos/README.md) §1 (if not already done).
|
|
2. **Talos Phase A (automated):** run [`playbooks/talos_phase_a.yml`](playbooks/talos_phase_a.yml) **or** the full pipeline [`playbooks/deploy.yml`](playbooks/deploy.yml). This runs **`talhelper genconfig -o out`**, **`talosctl apply-config`** on each node, **`talosctl bootstrap`**, and **`talosctl kubeconfig`** → **`talos/kubeconfig`**.
|
|
3. **Platform stack:** [`playbooks/noble.yml`](playbooks/noble.yml) (included at the end of **`deploy.yml`**).
|
|
|
|
Your workstation must be able to reach **node IPs on the lab LAN** (Talos API **:50000** for `talosctl`, Kubernetes **:6443** for `kubectl` / Helm). If `kubectl` cannot reach the VIP (`192.168.50.230`), use `-e 'noble_k8s_api_server_override=https://<control-plane-ip>:6443'` on **`noble.yml`** (see `group_vars/all.yml`).
|
|
|
|
**One-shot full deploy** (after nodes are booted and reachable):
|
|
|
|
```bash
|
|
cd ansible
|
|
ansible-playbook playbooks/deploy.yml
|
|
```
|
|
|
|
## Deploy secrets (`.env`)
|
|
|
|
Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignored). At minimum set **`CLOUDFLARE_DNS_API_TOKEN`** for cert-manager DNS-01. The **cert-manager** role applies it automatically during **`noble.yml`**. See **`.env.sample`** for optional placeholders (e.g. Newt/Pangolin).
|
|
|
|
## Prerequisites
|
|
|
|
- `talosctl` (matches node Talos version), `talhelper`, `helm`, `kubectl`.
|
|
- **SOPS secrets:** `sops` and `age` on the control host if you use **`clusters/noble/secrets/`** with **`age-key.txt`** (see **`clusters/noble/secrets/README.md`**).
|
|
- **Phase A:** same LAN/VPN as nodes so **Talos :50000** and **Kubernetes :6443** are reachable (see [`talos/README.md`](../talos/README.md) §3).
|
|
- **noble.yml:** bootstrapped cluster and **`talos/kubeconfig`** (or `KUBECONFIG`).
|
|
|
|
## Playbooks
|
|
|
|
| Playbook | Purpose |
|
|
|----------|---------|
|
|
| [`playbooks/deploy.yml`](playbooks/deploy.yml) | **Talos Phase A** then **`noble.yml`** (full automation). |
|
|
| [`playbooks/talos_phase_a.yml`](playbooks/talos_phase_a.yml) | `genconfig` → `apply-config` → `bootstrap` → `kubeconfig` only. |
|
|
| [`playbooks/noble.yml`](playbooks/noble.yml) | Helm + `kubectl` platform (after Phase A). |
|
|
| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | SOPS reminders and optional Argo root Application note. |
|
|
| [`playbooks/talos_bootstrap.yml`](playbooks/talos_bootstrap.yml) | **`talhelper genconfig` only** (legacy shortcut; prefer **`talos_phase_a.yml`**). |
|
|
| [`playbooks/debian_harden.yml`](playbooks/debian_harden.yml) | Baseline hardening for Debian servers (SSH/sysctl/fail2ban/unattended-upgrades). |
|
|
| [`playbooks/debian_maintenance.yml`](playbooks/debian_maintenance.yml) | Debian maintenance run (apt upgrades, autoremove/autoclean, reboot when required). |
|
|
| [`playbooks/debian_rotate_ssh_keys.yml`](playbooks/debian_rotate_ssh_keys.yml) | Rotate managed users' `authorized_keys`. |
|
|
| [`playbooks/debian_ops.yml`](playbooks/debian_ops.yml) | Convenience pipeline: harden then maintenance for Debian servers. |
|
|
| [`playbooks/proxmox_prepare.yml`](playbooks/proxmox_prepare.yml) | Configure Proxmox community repos and disable no-subscription UI warning. |
|
|
| [`playbooks/proxmox_upgrade.yml`](playbooks/proxmox_upgrade.yml) | Proxmox maintenance run (apt dist-upgrade, cleanup, reboot when required). |
|
|
| [`playbooks/proxmox_cluster.yml`](playbooks/proxmox_cluster.yml) | Create a Proxmox cluster on the master and join additional hosts. |
|
|
| [`playbooks/proxmox_ops.yml`](playbooks/proxmox_ops.yml) | Convenience pipeline: prepare, upgrade, then cluster Proxmox hosts. |
|
|
|
|
```bash
|
|
cd ansible
|
|
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
|
|
|
|
# noble.yml only — if VIP is unreachable from this host:
|
|
# ansible-playbook playbooks/noble.yml -e 'noble_k8s_api_server_override=https://192.168.50.20:6443'
|
|
|
|
ansible-playbook playbooks/noble.yml
|
|
ansible-playbook playbooks/post_deploy.yml
|
|
```
|
|
|
|
### Talos Phase A variables (role `talos_phase_a` defaults)
|
|
|
|
Override with `-e` when needed, e.g. **`-e noble_talos_skip_bootstrap=true`** if etcd is already initialized.
|
|
|
|
| Variable | Default | Meaning |
|
|
|----------|---------|---------|
|
|
| `noble_talos_genconfig` | `true` | Run **`talhelper genconfig -o out`** first. |
|
|
| `noble_talos_apply_mode` | `auto` | **`auto`** — **`talosctl apply-config --dry-run`** on the first node picks maintenance (**`--insecure`**) vs joined (**`TALOSCONFIG`**). **`insecure`** / **`secure`** force talos/README §2 A or B. |
|
|
| `noble_talos_skip_bootstrap` | `false` | Skip **`talosctl bootstrap`**. If etcd is **already** initialized, bootstrap is treated as a no-op (same as **`talosctl`** “etcd data directory is not empty”). |
|
|
| `noble_talos_apid_wait_delay` / `noble_talos_apid_wait_timeout` | `20` / `900` | Seconds to wait for **apid :50000** on the bootstrap node after **apply-config** (nodes reboot). Increase if bootstrap hits **connection refused** to `:50000`. |
|
|
| `noble_talos_nodes` | neon/argon/krypton/helium | IP + **`out/*.yaml`** filename — align with **`talos/talconfig.yaml`**. |
|
|
|
|
### Tags (partial runs)
|
|
|
|
```bash
|
|
ansible-playbook playbooks/noble.yml --tags cilium,metallb
|
|
ansible-playbook playbooks/noble.yml --skip-tags newt
|
|
ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true -e noble_velero_s3_bucket=... -e noble_velero_s3_url=...
|
|
```
|
|
|
|
### Variables — `group_vars/all.yml` and role defaults
|
|
|
|
- **`group_vars/all.yml`:** **`noble_newt_install`**, **`noble_velero_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_argocd_apply_root_application`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**
|
|
- **`roles/noble_platform/defaults/main.yml`:** **`noble_apply_sops_secrets`**, **`noble_sops_age_key_file`** (SOPS secrets under **`clusters/noble/secrets/`**)
|
|
|
|
## Roles
|
|
|
|
| Role | Contents |
|
|
|------|----------|
|
|
| `talos_phase_a` | Talos genconfig, apply-config, bootstrap, kubeconfig |
|
|
| `helm_repos` | `helm repo add` / `update` |
|
|
| `noble_*` | Cilium, CSI Volume Snapshot CRDs + controller, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional) |
|
|
| `noble_landing_urls` | Writes **`ansible/output/noble-lab-ui-urls.md`** — URLs, service names, and (optional) Argo/Grafana passwords from Secrets |
|
|
| `noble_post_deploy` | Post-install reminders |
|
|
| `talos_bootstrap` | Genconfig-only (used by older playbook) |
|
|
| `debian_baseline_hardening` | Baseline Debian hardening (SSH policy, sysctl profile, fail2ban, unattended upgrades) |
|
|
| `debian_maintenance` | Routine Debian maintenance tasks (updates, cleanup, reboot-on-required) |
|
|
| `debian_ssh_key_rotation` | Declarative `authorized_keys` rotation for server users |
|
|
| `proxmox_baseline` | Proxmox repo prep (community repos) and no-subscription warning suppression |
|
|
| `proxmox_maintenance` | Proxmox package maintenance (dist-upgrade, cleanup, reboot-on-required) |
|
|
| `proxmox_cluster` | Proxmox cluster bootstrap/join automation using `pvecm` |
|
|
|
|
## Debian server ops quick start
|
|
|
|
These playbooks are separate from the Talos/noble flow and target hosts in `debian_servers`.
|
|
|
|
1. Copy `inventory/debian.example.yml` to `inventory/debian.yml` and update hosts/users.
|
|
2. Update `group_vars/debian_servers.yml` with your allowed SSH users and real public keys.
|
|
3. Run with the Debian inventory:
|
|
|
|
```bash
|
|
cd ansible
|
|
ansible-playbook -i inventory/debian.yml playbooks/debian_harden.yml
|
|
ansible-playbook -i inventory/debian.yml playbooks/debian_rotate_ssh_keys.yml
|
|
ansible-playbook -i inventory/debian.yml playbooks/debian_maintenance.yml
|
|
```
|
|
|
|
Or run the combined maintenance pipeline:
|
|
|
|
```bash
|
|
cd ansible
|
|
ansible-playbook -i inventory/debian.yml playbooks/debian_ops.yml
|
|
```
|
|
|
|
## Proxmox host + cluster quick start
|
|
|
|
These playbooks are separate from the Talos/noble flow and target hosts in `proxmox_hosts`.
|
|
|
|
1. Copy `inventory/proxmox.example.yml` to `inventory/proxmox.yml` and update hosts/users.
|
|
2. Update `group_vars/proxmox_hosts.yml` with your cluster name (`proxmox_cluster_name`), chosen cluster master, and root public key file paths to install.
|
|
3. First run (no SSH keys yet): use `--ask-pass` **or** set `ansible_password` (prefer Ansible Vault). Keep `ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"` in inventory for first-contact hosts.
|
|
4. Run prepare first to install your public keys on each host, then continue:
|
|
|
|
```bash
|
|
cd ansible
|
|
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass
|
|
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
|
|
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml
|
|
```
|
|
|
|
After `proxmox_prepare.yml` finishes, SSH key auth should work for root (keys from `proxmox_root_authorized_key_files`), so `--ask-pass` is usually no longer needed.
|
|
|
|
If `pvecm add` still prompts for the master root password during join, set `proxmox_cluster_master_root_password` (prefer Vault) to run join non-interactively.
|
|
|
|
Changing `proxmox_cluster_name` only affects new cluster creation; it does not rename an already-created cluster.
|
|
|
|
Or run the full Proxmox pipeline:
|
|
|
|
```bash
|
|
cd ansible
|
|
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
|
|
```
|
|
|
|
## Migrating from Argo-managed `noble-platform`
|
|
|
|
```bash
|
|
kubectl delete application -n argocd noble-platform noble-kyverno noble-kyverno-policies --ignore-not-found
|
|
kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
|
|
```
|
|
|
|
Then run `playbooks/noble.yml` so Helm state matches git values.
|