Compare commits
27 Commits
76eb7df18c
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
aeffc7d6dd | ||
|
|
0f88a33216 | ||
|
|
bfb72cb519 | ||
|
|
51eb64dd9d | ||
|
|
f259285f6e | ||
|
|
c312ceeb56 | ||
|
|
c15bf4d708 | ||
|
|
89be30884e | ||
|
|
16948c62f9 | ||
|
|
3a6e5dff5b | ||
|
|
023ebfee5d | ||
|
|
27fb4113eb | ||
|
|
4026591f0b | ||
|
|
8a740019ad | ||
|
|
544f75b0ee | ||
|
|
33a10dc7e9 | ||
|
|
a4b9913b7e | ||
|
|
11c62009a4 | ||
|
|
03ed4e70a2 | ||
|
|
7855b10982 | ||
|
|
079c11b20c | ||
|
|
bf108a37e2 | ||
|
|
97b56581ed | ||
|
|
f154658d79 | ||
|
|
90509bacc5 | ||
|
|
e4741ecd15 | ||
|
|
f6647056be |
@@ -11,3 +11,9 @@ CLOUDFLARE_DNS_API_TOKEN=
|
||||
PANGOLIN_ENDPOINT=
|
||||
NEWT_ID=
|
||||
NEWT_SECRET=
|
||||
|
||||
# Velero — when **noble_velero_install=true**, set bucket + S3 API URL and credentials (see clusters/noble/bootstrap/velero/README.md).
|
||||
NOBLE_VELERO_S3_BUCKET=
|
||||
NOBLE_VELERO_S3_URL=
|
||||
NOBLE_VELERO_AWS_ACCESS_KEY_ID=
|
||||
NOBLE_VELERO_AWS_SECRET_ACCESS_KEY=
|
||||
|
||||
7
.sops.yaml
Normal file
7
.sops.yaml
Normal file
@@ -0,0 +1,7 @@
|
||||
# Mozilla SOPS — encrypt/decrypt Kubernetes Secret manifests under clusters/noble/secrets/
|
||||
# Generate a key: age-keygen -o age-key.txt (age-key.txt is gitignored)
|
||||
# Add the printed public key below (one recipient per line is supported).
|
||||
creation_rules:
|
||||
- path_regex: clusters/noble/secrets/.*\.yaml$
|
||||
age: >-
|
||||
age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn
|
||||
@@ -180,6 +180,12 @@ Shared services used across multiple applications.
|
||||
|
||||
**Configuration:** Requires Pangolin endpoint URL, Newt ID, and Newt secret.
|
||||
|
||||
### versitygw/ (`komodo/s3/versitygw/`)
|
||||
|
||||
- **[Versity S3 Gateway](https://github.com/versity/versitygw)** — S3 API on port **10000** by default; optional **WebUI** on **8080** (not the same listener—enable `VERSITYGW_WEBUI_PORT` / `VGW_WEBUI_GATEWAYS` per `.env.sample`). Behind **Pangolin**, expose the API and WebUI separately (or you will see **404** browsing the API URL).
|
||||
|
||||
**Configuration:** Set either `ROOT_ACCESS_KEY` / `ROOT_SECRET_KEY` or `ROOT_ACCESS_KEY_ID` / `ROOT_SECRET_ACCESS_KEY`. Optional `VERSITYGW_PORT`. Compose uses `${VAR}` interpolation so credentials work with Komodo’s `docker compose --env-file <run_directory>/.env` (avoid `env_file:` in the service when `run_directory` is not the same folder as `compose.yaml`, or the written `.env` will not be found).
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring (`komodo/monitor/`)
|
||||
|
||||
@@ -24,6 +24,7 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
|
||||
## Prerequisites
|
||||
|
||||
- `talosctl` (matches node Talos version), `talhelper`, `helm`, `kubectl`.
|
||||
- **SOPS secrets:** `sops` and `age` on the control host if you use **`clusters/noble/secrets/`** with **`age-key.txt`** (see **`clusters/noble/secrets/README.md`**).
|
||||
- **Phase A:** same LAN/VPN as nodes so **Talos :50000** and **Kubernetes :6443** are reachable (see [`talos/README.md`](../talos/README.md) §3).
|
||||
- **noble.yml:** bootstrapped cluster and **`talos/kubeconfig`** (or `KUBECONFIG`).
|
||||
|
||||
@@ -34,8 +35,16 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
|
||||
| [`playbooks/deploy.yml`](playbooks/deploy.yml) | **Talos Phase A** then **`noble.yml`** (full automation). |
|
||||
| [`playbooks/talos_phase_a.yml`](playbooks/talos_phase_a.yml) | `genconfig` → `apply-config` → `bootstrap` → `kubeconfig` only. |
|
||||
| [`playbooks/noble.yml`](playbooks/noble.yml) | Helm + `kubectl` platform (after Phase A). |
|
||||
| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | Vault / ESO reminders (`noble_apply_vault_cluster_secret_store`). |
|
||||
| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | SOPS reminders and optional Argo root Application note. |
|
||||
| [`playbooks/talos_bootstrap.yml`](playbooks/talos_bootstrap.yml) | **`talhelper genconfig` only** (legacy shortcut; prefer **`talos_phase_a.yml`**). |
|
||||
| [`playbooks/debian_harden.yml`](playbooks/debian_harden.yml) | Baseline hardening for Debian servers (SSH/sysctl/fail2ban/unattended-upgrades). |
|
||||
| [`playbooks/debian_maintenance.yml`](playbooks/debian_maintenance.yml) | Debian maintenance run (apt upgrades, autoremove/autoclean, reboot when required). |
|
||||
| [`playbooks/debian_rotate_ssh_keys.yml`](playbooks/debian_rotate_ssh_keys.yml) | Rotate managed users' `authorized_keys`. |
|
||||
| [`playbooks/debian_ops.yml`](playbooks/debian_ops.yml) | Convenience pipeline: harden then maintenance for Debian servers. |
|
||||
| [`playbooks/proxmox_prepare.yml`](playbooks/proxmox_prepare.yml) | Configure Proxmox community repos and disable no-subscription UI warning. |
|
||||
| [`playbooks/proxmox_upgrade.yml`](playbooks/proxmox_upgrade.yml) | Proxmox maintenance run (apt dist-upgrade, cleanup, reboot when required). |
|
||||
| [`playbooks/proxmox_cluster.yml`](playbooks/proxmox_cluster.yml) | Create a Proxmox cluster on the master and join additional hosts. |
|
||||
| [`playbooks/proxmox_ops.yml`](playbooks/proxmox_ops.yml) | Convenience pipeline: prepare, upgrade, then cluster Proxmox hosts. |
|
||||
|
||||
```bash
|
||||
cd ansible
|
||||
@@ -65,11 +74,13 @@ Override with `-e` when needed, e.g. **`-e noble_talos_skip_bootstrap=true`** if
|
||||
```bash
|
||||
ansible-playbook playbooks/noble.yml --tags cilium,metallb
|
||||
ansible-playbook playbooks/noble.yml --skip-tags newt
|
||||
ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true -e noble_velero_s3_bucket=... -e noble_velero_s3_url=...
|
||||
```
|
||||
|
||||
### Variables — `group_vars/all.yml`
|
||||
### Variables — `group_vars/all.yml` and role defaults
|
||||
|
||||
- **`noble_newt_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_apply_vault_cluster_secret_store`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**.
|
||||
- **`group_vars/all.yml`:** **`noble_newt_install`**, **`noble_velero_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_argocd_apply_root_application`**, **`noble_argocd_apply_bootstrap_root_application`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**
|
||||
- **`roles/noble_platform/defaults/main.yml`:** **`noble_apply_sops_secrets`**, **`noble_sops_age_key_file`** (SOPS secrets under **`clusters/noble/secrets/`**)
|
||||
|
||||
## Roles
|
||||
|
||||
@@ -77,10 +88,67 @@ ansible-playbook playbooks/noble.yml --skip-tags newt
|
||||
|------|----------|
|
||||
| `talos_phase_a` | Talos genconfig, apply-config, bootstrap, kubeconfig |
|
||||
| `helm_repos` | `helm repo add` / `update` |
|
||||
| `noble_*` | Cilium, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack |
|
||||
| `noble_*` | Cilium, CSI Volume Snapshot CRDs + controller, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional) |
|
||||
| `noble_landing_urls` | Writes **`ansible/output/noble-lab-ui-urls.md`** — URLs, service names, and (optional) Argo/Grafana passwords from Secrets |
|
||||
| `noble_post_deploy` | Post-install reminders |
|
||||
| `talos_bootstrap` | Genconfig-only (used by older playbook) |
|
||||
| `debian_baseline_hardening` | Baseline Debian hardening (SSH policy, sysctl profile, fail2ban, unattended upgrades) |
|
||||
| `debian_maintenance` | Routine Debian maintenance tasks (updates, cleanup, reboot-on-required) |
|
||||
| `debian_ssh_key_rotation` | Declarative `authorized_keys` rotation for server users |
|
||||
| `proxmox_baseline` | Proxmox repo prep (community repos) and no-subscription warning suppression |
|
||||
| `proxmox_maintenance` | Proxmox package maintenance (dist-upgrade, cleanup, reboot-on-required) |
|
||||
| `proxmox_cluster` | Proxmox cluster bootstrap/join automation using `pvecm` |
|
||||
|
||||
## Debian server ops quick start
|
||||
|
||||
These playbooks are separate from the Talos/noble flow and target hosts in `debian_servers`.
|
||||
|
||||
1. Copy `inventory/debian.example.yml` to `inventory/debian.yml` and update hosts/users.
|
||||
2. Update `group_vars/debian_servers.yml` with your allowed SSH users and real public keys.
|
||||
3. Run with the Debian inventory:
|
||||
|
||||
```bash
|
||||
cd ansible
|
||||
ansible-playbook -i inventory/debian.yml playbooks/debian_harden.yml
|
||||
ansible-playbook -i inventory/debian.yml playbooks/debian_rotate_ssh_keys.yml
|
||||
ansible-playbook -i inventory/debian.yml playbooks/debian_maintenance.yml
|
||||
```
|
||||
|
||||
Or run the combined maintenance pipeline:
|
||||
|
||||
```bash
|
||||
cd ansible
|
||||
ansible-playbook -i inventory/debian.yml playbooks/debian_ops.yml
|
||||
```
|
||||
|
||||
## Proxmox host + cluster quick start
|
||||
|
||||
These playbooks are separate from the Talos/noble flow and target hosts in `proxmox_hosts`.
|
||||
|
||||
1. Copy `inventory/proxmox.example.yml` to `inventory/proxmox.yml` and update hosts/users.
|
||||
2. Update `group_vars/proxmox_hosts.yml` with your cluster name (`proxmox_cluster_name`), chosen cluster master, and root public key file paths to install.
|
||||
3. First run (no SSH keys yet): use `--ask-pass` **or** set `ansible_password` (prefer Ansible Vault). Keep `ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"` in inventory for first-contact hosts.
|
||||
4. Run prepare first to install your public keys on each host, then continue:
|
||||
|
||||
```bash
|
||||
cd ansible
|
||||
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass
|
||||
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
|
||||
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml
|
||||
```
|
||||
|
||||
After `proxmox_prepare.yml` finishes, SSH key auth should work for root (keys from `proxmox_root_authorized_key_files`), so `--ask-pass` is usually no longer needed.
|
||||
|
||||
If `pvecm add` still prompts for the master root password during join, set `proxmox_cluster_master_root_password` (prefer Vault) to run join non-interactively.
|
||||
|
||||
Changing `proxmox_cluster_name` only affects new cluster creation; it does not rename an already-created cluster.
|
||||
|
||||
Or run the full Proxmox pipeline:
|
||||
|
||||
```bash
|
||||
cd ansible
|
||||
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
|
||||
```
|
||||
|
||||
## Migrating from Argo-managed `noble-platform`
|
||||
|
||||
|
||||
@@ -13,11 +13,16 @@ noble_k8s_api_server_fallback: "https://192.168.50.20:6443"
|
||||
# Only if you must skip the kubectl /healthz preflight (not recommended).
|
||||
noble_skip_k8s_health_check: false
|
||||
|
||||
# Pangolin / Newt — set true only after creating newt-pangolin-auth Secret (see clusters/noble/bootstrap/newt/README.md)
|
||||
# Pangolin / Newt — set true only after newt-pangolin-auth Secret exists (SOPS: clusters/noble/secrets/ or imperative — see clusters/noble/bootstrap/newt/README.md)
|
||||
noble_newt_install: false
|
||||
|
||||
# cert-manager needs Secret cloudflare-dns-api-token in cert-manager namespace before ClusterIssuers work
|
||||
noble_cert_manager_require_cloudflare_secret: true
|
||||
|
||||
# post_deploy.yml — apply Vault ClusterSecretStore only after Vault is initialized and K8s auth is configured
|
||||
noble_apply_vault_cluster_secret_store: false
|
||||
# Velero — set **noble_velero_install: true** plus S3 bucket/URL (and credentials — see clusters/noble/bootstrap/velero/README.md)
|
||||
noble_velero_install: false
|
||||
|
||||
# Argo CD — apply app-of-apps root Application (clusters/noble/bootstrap/argocd/root-application.yaml). Set false to skip.
|
||||
noble_argocd_apply_root_application: true
|
||||
# Bootstrap kustomize in Argo (**noble-bootstrap-root** → **clusters/noble/bootstrap**). Applied with manual sync; enable automation after **noble.yml** (see **clusters/noble/bootstrap/argocd/README.md** §5).
|
||||
noble_argocd_apply_bootstrap_root_application: true
|
||||
|
||||
12
ansible/group_vars/debian_servers.yml
Normal file
12
ansible/group_vars/debian_servers.yml
Normal file
@@ -0,0 +1,12 @@
|
||||
---
|
||||
# Hardened SSH settings
|
||||
debian_baseline_ssh_allow_users:
|
||||
- admin
|
||||
|
||||
# Example key rotation entries. Replace with your real users and keys.
|
||||
debian_ssh_rotation_users:
|
||||
- name: admin
|
||||
home: /home/admin
|
||||
state: present
|
||||
keys:
|
||||
- "ssh-ed25519 AAAAEXAMPLE_REPLACE_ME admin@workstation"
|
||||
37
ansible/group_vars/proxmox_hosts.yml
Normal file
37
ansible/group_vars/proxmox_hosts.yml
Normal file
@@ -0,0 +1,37 @@
|
||||
---
|
||||
# Proxmox repositories
|
||||
proxmox_repo_debian_codename: trixie
|
||||
proxmox_repo_disable_enterprise: true
|
||||
proxmox_repo_disable_ceph_enterprise: true
|
||||
proxmox_repo_enable_pve_no_subscription: true
|
||||
proxmox_repo_enable_ceph_no_subscription: true
|
||||
|
||||
# Suppress "No valid subscription" warning in UI
|
||||
proxmox_no_subscription_notice_disable: true
|
||||
|
||||
# Public keys to install for root on each Proxmox host.
|
||||
proxmox_root_authorized_key_files:
|
||||
- "{{ lookup('env', 'HOME') }}/.ssh/id_ed25519.pub"
|
||||
- "{{ lookup('env', 'HOME') }}/.ssh/ansible.pub"
|
||||
|
||||
# Package upgrade/reboot policy
|
||||
proxmox_upgrade_apt_cache_valid_time: 3600
|
||||
proxmox_upgrade_autoremove: true
|
||||
proxmox_upgrade_autoclean: true
|
||||
proxmox_upgrade_reboot_if_required: true
|
||||
proxmox_upgrade_reboot_timeout: 1800
|
||||
|
||||
# Cluster settings
|
||||
proxmox_cluster_enabled: true
|
||||
proxmox_cluster_name: atomic-hub
|
||||
|
||||
# Bootstrap host name from inventory (first host by default if empty)
|
||||
proxmox_cluster_master: ""
|
||||
|
||||
# Optional explicit IP/FQDN for joining; leave empty to use ansible_host of master
|
||||
proxmox_cluster_master_ip: ""
|
||||
proxmox_cluster_force: false
|
||||
|
||||
# Optional: use only for first cluster joins when inter-node SSH trust is not established.
|
||||
# Prefer storing with Ansible Vault if you set this.
|
||||
proxmox_cluster_master_root_password: "Hemroid8"
|
||||
11
ansible/inventory/debian.example.yml
Normal file
11
ansible/inventory/debian.example.yml
Normal file
@@ -0,0 +1,11 @@
|
||||
---
|
||||
all:
|
||||
children:
|
||||
debian_servers:
|
||||
hosts:
|
||||
debian-01:
|
||||
ansible_host: 192.168.50.101
|
||||
ansible_user: admin
|
||||
debian-02:
|
||||
ansible_host: 192.168.50.102
|
||||
ansible_user: admin
|
||||
24
ansible/inventory/proxmox.example.yml
Normal file
24
ansible/inventory/proxmox.example.yml
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
all:
|
||||
children:
|
||||
proxmox_hosts:
|
||||
vars:
|
||||
ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"
|
||||
hosts:
|
||||
helium:
|
||||
ansible_host: 192.168.1.100
|
||||
ansible_user: root
|
||||
# First run without SSH keys:
|
||||
# ansible_password: "{{ vault_proxmox_root_password }}"
|
||||
neon:
|
||||
ansible_host: 192.168.1.90
|
||||
ansible_user: root
|
||||
# ansible_password: "{{ vault_proxmox_root_password }}"
|
||||
argon:
|
||||
ansible_host: 192.168.1.80
|
||||
ansible_user: root
|
||||
# ansible_password: "{{ vault_proxmox_root_password }}"
|
||||
krypton:
|
||||
ansible_host: 192.168.1.70
|
||||
ansible_user: root
|
||||
# ansible_password: "{{ vault_proxmox_root_password }}"
|
||||
24
ansible/inventory/proxmox.yml
Normal file
24
ansible/inventory/proxmox.yml
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
all:
|
||||
children:
|
||||
proxmox_hosts:
|
||||
vars:
|
||||
ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"
|
||||
hosts:
|
||||
helium:
|
||||
ansible_host: 192.168.1.100
|
||||
ansible_user: root
|
||||
# First run without SSH keys:
|
||||
# ansible_password: "{{ vault_proxmox_root_password }}"
|
||||
neon:
|
||||
ansible_host: 192.168.1.90
|
||||
ansible_user: root
|
||||
# ansible_password: "{{ vault_proxmox_root_password }}"
|
||||
argon:
|
||||
ansible_host: 192.168.1.80
|
||||
ansible_user: root
|
||||
# ansible_password: "{{ vault_proxmox_root_password }}"
|
||||
krypton:
|
||||
ansible_host: 192.168.1.70
|
||||
ansible_user: root
|
||||
# ansible_password: "{{ vault_proxmox_root_password }}"
|
||||
8
ansible/playbooks/debian_harden.yml
Normal file
8
ansible/playbooks/debian_harden.yml
Normal file
@@ -0,0 +1,8 @@
|
||||
---
|
||||
- name: Debian server baseline hardening
|
||||
hosts: debian_servers
|
||||
become: true
|
||||
gather_facts: true
|
||||
roles:
|
||||
- role: debian_baseline_hardening
|
||||
tags: [hardening, baseline]
|
||||
8
ansible/playbooks/debian_maintenance.yml
Normal file
8
ansible/playbooks/debian_maintenance.yml
Normal file
@@ -0,0 +1,8 @@
|
||||
---
|
||||
- name: Debian maintenance (updates + reboot handling)
|
||||
hosts: debian_servers
|
||||
become: true
|
||||
gather_facts: true
|
||||
roles:
|
||||
- role: debian_maintenance
|
||||
tags: [maintenance, updates]
|
||||
3
ansible/playbooks/debian_ops.yml
Normal file
3
ansible/playbooks/debian_ops.yml
Normal file
@@ -0,0 +1,3 @@
|
||||
---
|
||||
- import_playbook: debian_harden.yml
|
||||
- import_playbook: debian_maintenance.yml
|
||||
8
ansible/playbooks/debian_rotate_ssh_keys.yml
Normal file
8
ansible/playbooks/debian_rotate_ssh_keys.yml
Normal file
@@ -0,0 +1,8 @@
|
||||
---
|
||||
- name: Debian SSH key rotation
|
||||
hosts: debian_servers
|
||||
become: true
|
||||
gather_facts: false
|
||||
roles:
|
||||
- role: debian_ssh_key_rotation
|
||||
tags: [ssh, ssh_keys, rotation]
|
||||
@@ -3,8 +3,8 @@
|
||||
# Do not run until `kubectl get --raw /healthz` returns ok (see talos/README.md §3, CLUSTER-BUILD Phase A).
|
||||
# Run from repo **ansible/** directory: ansible-playbook playbooks/noble.yml
|
||||
#
|
||||
# Tags: repos, cilium, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt,
|
||||
# argocd, kyverno, kyverno_policies, platform, all (default)
|
||||
# Tags: repos, cilium, csi_snapshot, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt,
|
||||
# argocd, kyverno, kyverno_policies, platform, velero, all (default)
|
||||
- name: Noble cluster — platform stack (Ansible-managed)
|
||||
hosts: localhost
|
||||
connection: local
|
||||
@@ -113,6 +113,7 @@
|
||||
tags: [always]
|
||||
|
||||
# talosctl kubeconfig often sets server to the VIP; off-LAN you can reach a control-plane IP but not 192.168.50.230.
|
||||
# kubectl stderr is often "The connection to the server ... was refused" (no substring "connection refused").
|
||||
- name: Auto-fallback API server when VIP is unreachable (temp kubeconfig)
|
||||
tags: [always]
|
||||
when:
|
||||
@@ -120,8 +121,7 @@
|
||||
- noble_k8s_api_server_override | default('') | length == 0
|
||||
- not (noble_skip_k8s_health_check | default(false) | bool)
|
||||
- (noble_k8s_health_first.rc | default(1)) != 0 or (noble_k8s_health_first.stdout | default('') | trim) != 'ok'
|
||||
- ('network is unreachable' in (noble_k8s_health_first.stderr | default('') | lower)) or
|
||||
('no route to host' in (noble_k8s_health_first.stderr | default('') | lower))
|
||||
- (((noble_k8s_health_first.stderr | default('')) ~ (noble_k8s_health_first.stdout | default(''))) | lower) is search('network is unreachable|no route to host|connection refused|was refused', multiline=False)
|
||||
block:
|
||||
- name: Ensure temp dir for kubeconfig auto-fallback
|
||||
ansible.builtin.file:
|
||||
@@ -202,6 +202,8 @@
|
||||
tags: [repos, helm]
|
||||
- role: noble_cilium
|
||||
tags: [cilium, cni]
|
||||
- role: noble_csi_snapshot_controller
|
||||
tags: [csi_snapshot, snapshot, storage]
|
||||
- role: noble_metrics_server
|
||||
tags: [metrics, metrics_server]
|
||||
- role: noble_longhorn
|
||||
@@ -224,5 +226,7 @@
|
||||
tags: [kyverno_policies, policy]
|
||||
- role: noble_platform
|
||||
tags: [platform, observability, apps]
|
||||
- role: noble_velero
|
||||
tags: [velero, backups]
|
||||
- role: noble_landing_urls
|
||||
tags: [landing, platform, observability, apps]
|
||||
|
||||
@@ -1,12 +1,7 @@
|
||||
---
|
||||
# Manual follow-ups after **noble.yml**: Vault init/unseal, Kubernetes auth for Vault, ESO ClusterSecretStore.
|
||||
# Run: ansible-playbook playbooks/post_deploy.yml
|
||||
- name: Noble cluster — post-install reminders
|
||||
hosts: localhost
|
||||
# Manual follow-ups after **noble.yml**: SOPS key backup, optional Argo root Application.
|
||||
- hosts: localhost
|
||||
connection: local
|
||||
gather_facts: false
|
||||
vars:
|
||||
noble_repo_root: "{{ playbook_dir | dirname | dirname }}"
|
||||
noble_kubeconfig: "{{ lookup('env', 'KUBECONFIG') | default(noble_repo_root + '/talos/kubeconfig', true) }}"
|
||||
roles:
|
||||
- role: noble_post_deploy
|
||||
- noble_post_deploy
|
||||
|
||||
9
ansible/playbooks/proxmox_cluster.yml
Normal file
9
ansible/playbooks/proxmox_cluster.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
---
|
||||
- name: Proxmox cluster bootstrap/join
|
||||
hosts: proxmox_hosts
|
||||
become: true
|
||||
gather_facts: false
|
||||
serial: 1
|
||||
roles:
|
||||
- role: proxmox_cluster
|
||||
tags: [proxmox, cluster]
|
||||
4
ansible/playbooks/proxmox_ops.yml
Normal file
4
ansible/playbooks/proxmox_ops.yml
Normal file
@@ -0,0 +1,4 @@
|
||||
---
|
||||
- import_playbook: proxmox_prepare.yml
|
||||
- import_playbook: proxmox_upgrade.yml
|
||||
- import_playbook: proxmox_cluster.yml
|
||||
8
ansible/playbooks/proxmox_prepare.yml
Normal file
8
ansible/playbooks/proxmox_prepare.yml
Normal file
@@ -0,0 +1,8 @@
|
||||
---
|
||||
- name: Proxmox host preparation (community repos + no-subscription notice)
|
||||
hosts: proxmox_hosts
|
||||
become: true
|
||||
gather_facts: true
|
||||
roles:
|
||||
- role: proxmox_baseline
|
||||
tags: [proxmox, prepare, repos, ui]
|
||||
9
ansible/playbooks/proxmox_upgrade.yml
Normal file
9
ansible/playbooks/proxmox_upgrade.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
---
|
||||
- name: Proxmox host maintenance (upgrade to latest)
|
||||
hosts: proxmox_hosts
|
||||
become: true
|
||||
gather_facts: true
|
||||
serial: 1
|
||||
roles:
|
||||
- role: proxmox_maintenance
|
||||
tags: [proxmox, maintenance, updates]
|
||||
39
ansible/roles/debian_baseline_hardening/defaults/main.yml
Normal file
39
ansible/roles/debian_baseline_hardening/defaults/main.yml
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
# Update apt metadata only when stale (seconds)
|
||||
debian_baseline_apt_cache_valid_time: 3600
|
||||
|
||||
# Core host hardening packages
|
||||
debian_baseline_packages:
|
||||
- unattended-upgrades
|
||||
- apt-listchanges
|
||||
- fail2ban
|
||||
- needrestart
|
||||
- sudo
|
||||
- ca-certificates
|
||||
|
||||
# SSH hardening controls
|
||||
debian_baseline_ssh_permit_root_login: "no"
|
||||
debian_baseline_ssh_password_authentication: "no"
|
||||
debian_baseline_ssh_pubkey_authentication: "yes"
|
||||
debian_baseline_ssh_x11_forwarding: "no"
|
||||
debian_baseline_ssh_max_auth_tries: 3
|
||||
debian_baseline_ssh_client_alive_interval: 300
|
||||
debian_baseline_ssh_client_alive_count_max: 2
|
||||
debian_baseline_ssh_allow_users: []
|
||||
|
||||
# unattended-upgrades controls
|
||||
debian_baseline_enable_unattended_upgrades: true
|
||||
debian_baseline_unattended_auto_upgrade: "1"
|
||||
debian_baseline_unattended_update_lists: "1"
|
||||
|
||||
# Kernel and network hardening sysctls
|
||||
debian_baseline_sysctl_settings:
|
||||
net.ipv4.conf.all.accept_redirects: "0"
|
||||
net.ipv4.conf.default.accept_redirects: "0"
|
||||
net.ipv4.conf.all.send_redirects: "0"
|
||||
net.ipv4.conf.default.send_redirects: "0"
|
||||
net.ipv4.conf.all.log_martians: "1"
|
||||
net.ipv4.conf.default.log_martians: "1"
|
||||
net.ipv4.tcp_syncookies: "1"
|
||||
net.ipv6.conf.all.accept_redirects: "0"
|
||||
net.ipv6.conf.default.accept_redirects: "0"
|
||||
12
ansible/roles/debian_baseline_hardening/handlers/main.yml
Normal file
12
ansible/roles/debian_baseline_hardening/handlers/main.yml
Normal file
@@ -0,0 +1,12 @@
|
||||
---
|
||||
- name: Restart ssh
|
||||
ansible.builtin.service:
|
||||
name: ssh
|
||||
state: restarted
|
||||
|
||||
- name: Reload sysctl
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- sysctl
|
||||
- --system
|
||||
changed_when: true
|
||||
52
ansible/roles/debian_baseline_hardening/tasks/main.yml
Normal file
52
ansible/roles/debian_baseline_hardening/tasks/main.yml
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
- name: Refresh apt cache
|
||||
ansible.builtin.apt:
|
||||
update_cache: true
|
||||
cache_valid_time: "{{ debian_baseline_apt_cache_valid_time }}"
|
||||
|
||||
- name: Install baseline hardening packages
|
||||
ansible.builtin.apt:
|
||||
name: "{{ debian_baseline_packages }}"
|
||||
state: present
|
||||
|
||||
- name: Configure unattended-upgrades auto settings
|
||||
ansible.builtin.copy:
|
||||
dest: /etc/apt/apt.conf.d/20auto-upgrades
|
||||
mode: "0644"
|
||||
content: |
|
||||
APT::Periodic::Update-Package-Lists "{{ debian_baseline_unattended_update_lists }}";
|
||||
APT::Periodic::Unattended-Upgrade "{{ debian_baseline_unattended_auto_upgrade }}";
|
||||
when: debian_baseline_enable_unattended_upgrades | bool
|
||||
|
||||
- name: Configure SSH hardening options
|
||||
ansible.builtin.copy:
|
||||
dest: /etc/ssh/sshd_config.d/99-hardening.conf
|
||||
mode: "0644"
|
||||
content: |
|
||||
PermitRootLogin {{ debian_baseline_ssh_permit_root_login }}
|
||||
PasswordAuthentication {{ debian_baseline_ssh_password_authentication }}
|
||||
PubkeyAuthentication {{ debian_baseline_ssh_pubkey_authentication }}
|
||||
X11Forwarding {{ debian_baseline_ssh_x11_forwarding }}
|
||||
MaxAuthTries {{ debian_baseline_ssh_max_auth_tries }}
|
||||
ClientAliveInterval {{ debian_baseline_ssh_client_alive_interval }}
|
||||
ClientAliveCountMax {{ debian_baseline_ssh_client_alive_count_max }}
|
||||
{% if debian_baseline_ssh_allow_users | length > 0 %}
|
||||
AllowUsers {{ debian_baseline_ssh_allow_users | join(' ') }}
|
||||
{% endif %}
|
||||
notify: Restart ssh
|
||||
|
||||
- name: Configure baseline sysctls
|
||||
ansible.builtin.copy:
|
||||
dest: /etc/sysctl.d/99-hardening.conf
|
||||
mode: "0644"
|
||||
content: |
|
||||
{% for key, value in debian_baseline_sysctl_settings.items() %}
|
||||
{{ key }} = {{ value }}
|
||||
{% endfor %}
|
||||
notify: Reload sysctl
|
||||
|
||||
- name: Ensure fail2ban service is enabled
|
||||
ansible.builtin.service:
|
||||
name: fail2ban
|
||||
enabled: true
|
||||
state: started
|
||||
7
ansible/roles/debian_maintenance/defaults/main.yml
Normal file
7
ansible/roles/debian_maintenance/defaults/main.yml
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
debian_maintenance_apt_cache_valid_time: 3600
|
||||
debian_maintenance_upgrade_type: dist
|
||||
debian_maintenance_autoremove: true
|
||||
debian_maintenance_autoclean: true
|
||||
debian_maintenance_reboot_if_required: true
|
||||
debian_maintenance_reboot_timeout: 1800
|
||||
30
ansible/roles/debian_maintenance/tasks/main.yml
Normal file
30
ansible/roles/debian_maintenance/tasks/main.yml
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
- name: Refresh apt cache
|
||||
ansible.builtin.apt:
|
||||
update_cache: true
|
||||
cache_valid_time: "{{ debian_maintenance_apt_cache_valid_time }}"
|
||||
|
||||
- name: Upgrade Debian packages
|
||||
ansible.builtin.apt:
|
||||
upgrade: "{{ debian_maintenance_upgrade_type }}"
|
||||
|
||||
- name: Remove orphaned packages
|
||||
ansible.builtin.apt:
|
||||
autoremove: "{{ debian_maintenance_autoremove }}"
|
||||
|
||||
- name: Clean apt package cache
|
||||
ansible.builtin.apt:
|
||||
autoclean: "{{ debian_maintenance_autoclean }}"
|
||||
|
||||
- name: Check if reboot is required
|
||||
ansible.builtin.stat:
|
||||
path: /var/run/reboot-required
|
||||
register: debian_maintenance_reboot_required_file
|
||||
|
||||
- name: Reboot when required by package updates
|
||||
ansible.builtin.reboot:
|
||||
reboot_timeout: "{{ debian_maintenance_reboot_timeout }}"
|
||||
msg: "Reboot initiated by Ansible maintenance playbook"
|
||||
when:
|
||||
- debian_maintenance_reboot_if_required | bool
|
||||
- debian_maintenance_reboot_required_file.stat.exists | default(false)
|
||||
10
ansible/roles/debian_ssh_key_rotation/defaults/main.yml
Normal file
10
ansible/roles/debian_ssh_key_rotation/defaults/main.yml
Normal file
@@ -0,0 +1,10 @@
|
||||
---
|
||||
# List of users to manage keys for.
|
||||
# Example:
|
||||
# debian_ssh_rotation_users:
|
||||
# - name: deploy
|
||||
# home: /home/deploy
|
||||
# state: present
|
||||
# keys:
|
||||
# - "ssh-ed25519 AAAA... deploy@laptop"
|
||||
debian_ssh_rotation_users: []
|
||||
50
ansible/roles/debian_ssh_key_rotation/tasks/main.yml
Normal file
50
ansible/roles/debian_ssh_key_rotation/tasks/main.yml
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
- name: Validate SSH key rotation inputs
|
||||
ansible.builtin.assert:
|
||||
that:
|
||||
- item.name is defined
|
||||
- item.home is defined
|
||||
- (item.state | default('present')) in ['present', 'absent']
|
||||
- (item.state | default('present')) == 'absent' or (item.keys is defined and item.keys | length > 0)
|
||||
fail_msg: >-
|
||||
Each entry in debian_ssh_rotation_users must include name, home, and either:
|
||||
state=absent, or keys with at least one SSH public key.
|
||||
loop: "{{ debian_ssh_rotation_users }}"
|
||||
loop_control:
|
||||
label: "{{ item.name | default('unknown') }}"
|
||||
|
||||
- name: Ensure ~/.ssh exists for managed users
|
||||
ansible.builtin.file:
|
||||
path: "{{ item.home }}/.ssh"
|
||||
state: directory
|
||||
owner: "{{ item.name }}"
|
||||
group: "{{ item.name }}"
|
||||
mode: "0700"
|
||||
loop: "{{ debian_ssh_rotation_users }}"
|
||||
loop_control:
|
||||
label: "{{ item.name }}"
|
||||
when: (item.state | default('present')) == 'present'
|
||||
|
||||
- name: Rotate authorized_keys for managed users
|
||||
ansible.builtin.copy:
|
||||
dest: "{{ item.home }}/.ssh/authorized_keys"
|
||||
owner: "{{ item.name }}"
|
||||
group: "{{ item.name }}"
|
||||
mode: "0600"
|
||||
content: |
|
||||
{% for key in item.keys %}
|
||||
{{ key }}
|
||||
{% endfor %}
|
||||
loop: "{{ debian_ssh_rotation_users }}"
|
||||
loop_control:
|
||||
label: "{{ item.name }}"
|
||||
when: (item.state | default('present')) == 'present'
|
||||
|
||||
- name: Remove authorized_keys for users marked absent
|
||||
ansible.builtin.file:
|
||||
path: "{{ item.home }}/.ssh/authorized_keys"
|
||||
state: absent
|
||||
loop: "{{ debian_ssh_rotation_users }}"
|
||||
loop_control:
|
||||
label: "{{ item.name }}"
|
||||
when: (item.state | default('present')) == 'absent'
|
||||
@@ -8,11 +8,9 @@ noble_helm_repos:
|
||||
- { name: fossorial, url: "https://charts.fossorial.io" }
|
||||
- { name: argo, url: "https://argoproj.github.io/argo-helm" }
|
||||
- { name: metrics-server, url: "https://kubernetes-sigs.github.io/metrics-server/" }
|
||||
- { name: sealed-secrets, url: "https://bitnami-labs.github.io/sealed-secrets" }
|
||||
- { name: external-secrets, url: "https://charts.external-secrets.io" }
|
||||
- { name: hashicorp, url: "https://helm.releases.hashicorp.com" }
|
||||
- { name: prometheus-community, url: "https://prometheus-community.github.io/helm-charts" }
|
||||
- { name: grafana, url: "https://grafana.github.io/helm-charts" }
|
||||
- { name: fluent, url: "https://fluent.github.io/helm-charts" }
|
||||
- { name: headlamp, url: "https://kubernetes-sigs.github.io/headlamp/" }
|
||||
- { name: kyverno, url: "https://kyverno.github.io/kyverno/" }
|
||||
- { name: vmware-tanzu, url: "https://vmware-tanzu.github.io/helm-charts" }
|
||||
|
||||
6
ansible/roles/noble_argocd/defaults/main.yml
Normal file
6
ansible/roles/noble_argocd/defaults/main.yml
Normal file
@@ -0,0 +1,6 @@
|
||||
---
|
||||
# When true, applies clusters/noble/bootstrap/argocd/root-application.yaml (app-of-apps).
|
||||
# Edit spec.source.repoURL in that file if your Git remote differs.
|
||||
noble_argocd_apply_root_application: false
|
||||
# When true, applies clusters/noble/bootstrap/argocd/bootstrap-root-application.yaml (noble-bootstrap-root; manual sync until README §5).
|
||||
noble_argocd_apply_bootstrap_root_application: true
|
||||
@@ -15,6 +15,32 @@
|
||||
- -f
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/argocd/values.yaml"
|
||||
- --wait
|
||||
- --timeout
|
||||
- 15m
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
changed_when: true
|
||||
|
||||
- name: Apply Argo CD root Application (app-of-apps)
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
- apply
|
||||
- -f
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/argocd/root-application.yaml"
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
when: noble_argocd_apply_root_application | default(false) | bool
|
||||
changed_when: true
|
||||
|
||||
- name: Apply Argo CD bootstrap app-of-apps Application
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
- apply
|
||||
- -f
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/argocd/bootstrap-root-application.yaml"
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
when: noble_argocd_apply_bootstrap_root_application | default(false) | bool
|
||||
changed_when: true
|
||||
|
||||
@@ -0,0 +1,2 @@
|
||||
---
|
||||
noble_csi_snapshot_kubectl_timeout: 120s
|
||||
39
ansible/roles/noble_csi_snapshot_controller/tasks/main.yml
Normal file
39
ansible/roles/noble_csi_snapshot_controller/tasks/main.yml
Normal file
@@ -0,0 +1,39 @@
|
||||
---
|
||||
# Volume Snapshot CRDs + snapshot-controller (Velero CSI / Longhorn snapshots).
|
||||
- name: Apply Volume Snapshot CRDs (snapshot.storage.k8s.io)
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
- apply
|
||||
- "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}"
|
||||
- -k
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/crd"
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
changed_when: true
|
||||
|
||||
- name: Apply snapshot-controller in kube-system
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
- apply
|
||||
- "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}"
|
||||
- -k
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/controller"
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
changed_when: true
|
||||
|
||||
- name: Wait for snapshot-controller Deployment
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
- -n
|
||||
- kube-system
|
||||
- rollout
|
||||
- status
|
||||
- deploy/snapshot-controller
|
||||
- --timeout=120s
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
changed_when: false
|
||||
@@ -39,8 +39,13 @@ noble_lab_ui_entries:
|
||||
namespace: longhorn-system
|
||||
service: longhorn-frontend
|
||||
url: https://longhorn.apps.noble.lab.pcenicni.dev
|
||||
- name: Vault
|
||||
description: Secrets engine UI (after init/unseal)
|
||||
namespace: vault
|
||||
service: vault
|
||||
url: https://vault.apps.noble.lab.pcenicni.dev
|
||||
- name: Velero
|
||||
description: Cluster backups — no web UI (velero CLI / kubectl CRDs)
|
||||
namespace: velero
|
||||
service: velero
|
||||
url: ""
|
||||
- name: Homepage
|
||||
description: App dashboard (links to lab UIs)
|
||||
namespace: homepage
|
||||
service: homepage
|
||||
url: https://homepage.apps.noble.lab.pcenicni.dev
|
||||
|
||||
@@ -11,7 +11,7 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t
|
||||
| UI | What | Kubernetes service | Namespace | URL |
|
||||
|----|------|----------------------|-----------|-----|
|
||||
{% for e in noble_lab_ui_entries %}
|
||||
| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | [{{ e.url }}]({{ e.url }}) |
|
||||
| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | {% if e.url | default('') | length > 0 %}[{{ e.url }}]({{ e.url }}){% else %}—{% endif %} |
|
||||
{% endfor %}
|
||||
|
||||
## Initial access (logins)
|
||||
@@ -24,7 +24,6 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t
|
||||
| **Prometheus** | — | No auth in default install (lab). |
|
||||
| **Alertmanager** | — | No auth in default install (lab). |
|
||||
| **Longhorn** | — | No default login unless you enable access control in the UI settings. |
|
||||
| **Vault** | Token | Root token is only from **`vault operator init`** (not stored in git). See `clusters/noble/bootstrap/vault/README.md`. |
|
||||
|
||||
### Commands to retrieve passwords (if not filled above)
|
||||
|
||||
@@ -46,6 +45,7 @@ To generate this file **without** calling kubectl, run Ansible with **`-e noble_
|
||||
|
||||
- **Argo CD** `argocd-initial-admin-secret` disappears after you change the admin password.
|
||||
- **Grafana** password is random unless you set `grafana.adminPassword` in chart values.
|
||||
- **Vault** UI needs **unsealed** Vault; tokens come from your chosen auth method.
|
||||
- **Prometheus / Alertmanager** UIs are unauthenticated by default — restrict when hardening (`talos/CLUSTER-BUILD.md` Phase G).
|
||||
- **SOPS:** cluster secrets in git under **`clusters/noble/secrets/`** are encrypted; decrypt with **`age-key.txt`** (not in git). See **`clusters/noble/secrets/README.md`**.
|
||||
- **Headlamp** token above expires after the configured duration; re-run Ansible or `kubectl create token` to refresh.
|
||||
- **Velero** has **no web UI** — use **`velero`** CLI or **`kubectl -n velero get backup,schedule,backupstoragelocation`**. Metrics: **`velero`** Service in **`velero`** (Prometheus scrape). See `clusters/noble/bootstrap/velero/README.md`.
|
||||
|
||||
@@ -4,5 +4,6 @@ noble_platform_kubectl_request_timeout: 120s
|
||||
noble_platform_kustomize_retries: 5
|
||||
noble_platform_kustomize_delay: 20
|
||||
|
||||
# Vault: injector (vault-k8s) owns MutatingWebhookConfiguration.caBundle; Helm upgrade can SSA-conflict. Delete webhook so Helm can recreate it.
|
||||
noble_vault_delete_injector_webhook_before_helm: true
|
||||
# Decrypt **clusters/noble/secrets/*.yaml** with SOPS and kubectl apply (requires **sops**, **age**, and **age-key.txt**).
|
||||
noble_apply_sops_secrets: true
|
||||
noble_sops_age_key_file: "{{ noble_repo_root }}/age-key.txt"
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
# Mirrors former **noble-platform** Argo Application: Helm releases + plain manifests under clusters/noble/bootstrap.
|
||||
- name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource, Vault extras)
|
||||
- name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource)
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
@@ -16,77 +16,26 @@
|
||||
until: noble_platform_kustomize.rc == 0
|
||||
changed_when: true
|
||||
|
||||
- name: Install Sealed Secrets
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- helm
|
||||
- upgrade
|
||||
- --install
|
||||
- sealed-secrets
|
||||
- sealed-secrets/sealed-secrets
|
||||
- --namespace
|
||||
- sealed-secrets
|
||||
- --version
|
||||
- "2.18.4"
|
||||
- -f
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/sealed-secrets/values.yaml"
|
||||
- --wait
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
changed_when: true
|
||||
- name: Stat SOPS age private key (age-key.txt)
|
||||
ansible.builtin.stat:
|
||||
path: "{{ noble_sops_age_key_file }}"
|
||||
register: noble_sops_age_key_stat
|
||||
|
||||
- name: Install External Secrets Operator
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- helm
|
||||
- upgrade
|
||||
- --install
|
||||
- external-secrets
|
||||
- external-secrets/external-secrets
|
||||
- --namespace
|
||||
- external-secrets
|
||||
- --version
|
||||
- "2.2.0"
|
||||
- -f
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/values.yaml"
|
||||
- --wait
|
||||
- name: Apply SOPS-encrypted cluster secrets (clusters/noble/secrets/*.yaml)
|
||||
ansible.builtin.shell: |
|
||||
set -euo pipefail
|
||||
shopt -s nullglob
|
||||
for f in "{{ noble_repo_root }}/clusters/noble/secrets"/*.yaml; do
|
||||
sops -d "$f" | kubectl apply -f -
|
||||
done
|
||||
args:
|
||||
executable: /bin/bash
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
changed_when: true
|
||||
|
||||
# vault-k8s patches webhook CA after install; Helm 3/4 SSA then conflicts on upgrade. Removing the MWC lets Helm re-apply cleanly; injector repopulates caBundle.
|
||||
- name: Delete Vault agent injector MutatingWebhookConfiguration before Helm (avoids caBundle field conflict)
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
- delete
|
||||
- mutatingwebhookconfiguration
|
||||
- vault-agent-injector-cfg
|
||||
- --ignore-not-found
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
register: noble_vault_mwc_delete
|
||||
when: noble_vault_delete_injector_webhook_before_helm | default(true) | bool
|
||||
changed_when: "'deleted' in (noble_vault_mwc_delete.stdout | default(''))"
|
||||
|
||||
- name: Install Vault
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- helm
|
||||
- upgrade
|
||||
- --install
|
||||
- vault
|
||||
- hashicorp/vault
|
||||
- --namespace
|
||||
- vault
|
||||
- --version
|
||||
- "0.32.0"
|
||||
- -f
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/vault/values.yaml"
|
||||
- --wait
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
HELM_SERVER_SIDE_APPLY: "false"
|
||||
SOPS_AGE_KEY_FILE: "{{ noble_sops_age_key_file }}"
|
||||
when:
|
||||
- noble_apply_sops_secrets | default(true) | bool
|
||||
- noble_sops_age_key_stat.stat.exists
|
||||
changed_when: true
|
||||
|
||||
- name: Install kube-prometheus-stack
|
||||
|
||||
@@ -1,27 +1,15 @@
|
||||
---
|
||||
- name: Vault — manual steps (not automated)
|
||||
- name: SOPS secrets (workstation)
|
||||
ansible.builtin.debug:
|
||||
msg: |
|
||||
1. kubectl -n vault get pods (wait for Running)
|
||||
2. kubectl -n vault exec -it vault-0 -- vault operator init (once; save keys)
|
||||
3. Unseal per clusters/noble/bootstrap/vault/README.md
|
||||
4. ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
|
||||
5. kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
|
||||
|
||||
- name: Optional — apply Vault ClusterSecretStore for External Secrets
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
- apply
|
||||
- -f
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
when: noble_apply_vault_cluster_secret_store | default(false) | bool
|
||||
changed_when: true
|
||||
Encrypted Kubernetes Secrets live under clusters/noble/secrets/ (Mozilla SOPS + age).
|
||||
Private key: age-key.txt at repo root (gitignored). See clusters/noble/secrets/README.md
|
||||
and .sops.yaml. noble.yml decrypt-applies these when age-key.txt exists.
|
||||
|
||||
- name: Argo CD optional root Application (empty app-of-apps)
|
||||
ansible.builtin.debug:
|
||||
msg: >-
|
||||
Optional: kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
|
||||
after editing repoURL. Core workloads are not synced by Argo — see clusters/noble/apps/README.md
|
||||
App-of-apps: noble.yml applies root-application.yaml when noble_argocd_apply_root_application is true;
|
||||
bootstrap-root-application.yaml when noble_argocd_apply_bootstrap_root_application is true (group_vars/all.yml).
|
||||
noble-bootstrap-root uses manual sync until you enable automation after the playbook —
|
||||
clusters/noble/bootstrap/argocd/README.md §5. See clusters/noble/apps/README.md and that README.
|
||||
|
||||
13
ansible/roles/noble_velero/defaults/main.yml
Normal file
13
ansible/roles/noble_velero/defaults/main.yml
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
# **noble_velero_install** is in **ansible/group_vars/all.yml**. Override S3 fields via extra-vars or group_vars.
|
||||
noble_velero_chart_version: "12.0.0"
|
||||
|
||||
noble_velero_s3_bucket: ""
|
||||
noble_velero_s3_url: ""
|
||||
noble_velero_s3_region: "us-east-1"
|
||||
noble_velero_s3_force_path_style: "true"
|
||||
noble_velero_s3_prefix: ""
|
||||
|
||||
# Optional — if unset, Ansible expects Secret **velero/velero-cloud-credentials** (key **cloud**) to exist.
|
||||
noble_velero_aws_access_key_id: ""
|
||||
noble_velero_aws_secret_access_key: ""
|
||||
68
ansible/roles/noble_velero/tasks/from_env.yml
Normal file
68
ansible/roles/noble_velero/tasks/from_env.yml
Normal file
@@ -0,0 +1,68 @@
|
||||
---
|
||||
# See repository **.env.sample** — copy to **.env** (gitignored).
|
||||
- name: Stat repository .env for Velero
|
||||
ansible.builtin.stat:
|
||||
path: "{{ noble_repo_root }}/.env"
|
||||
register: noble_deploy_env_file
|
||||
changed_when: false
|
||||
|
||||
- name: Load NOBLE_VELERO_S3_BUCKET from .env when unset
|
||||
ansible.builtin.shell: |
|
||||
set -a
|
||||
. "{{ noble_repo_root }}/.env"
|
||||
set +a
|
||||
echo "${NOBLE_VELERO_S3_BUCKET:-}"
|
||||
register: noble_velero_s3_bucket_from_env
|
||||
when:
|
||||
- noble_deploy_env_file.stat.exists | default(false)
|
||||
- noble_velero_s3_bucket | default('') | length == 0
|
||||
changed_when: false
|
||||
|
||||
- name: Apply NOBLE_VELERO_S3_BUCKET from .env
|
||||
ansible.builtin.set_fact:
|
||||
noble_velero_s3_bucket: "{{ noble_velero_s3_bucket_from_env.stdout | trim }}"
|
||||
when:
|
||||
- noble_velero_s3_bucket_from_env is defined
|
||||
- (noble_velero_s3_bucket_from_env.stdout | default('') | trim | length) > 0
|
||||
|
||||
- name: Load NOBLE_VELERO_S3_URL from .env when unset
|
||||
ansible.builtin.shell: |
|
||||
set -a
|
||||
. "{{ noble_repo_root }}/.env"
|
||||
set +a
|
||||
echo "${NOBLE_VELERO_S3_URL:-}"
|
||||
register: noble_velero_s3_url_from_env
|
||||
when:
|
||||
- noble_deploy_env_file.stat.exists | default(false)
|
||||
- noble_velero_s3_url | default('') | length == 0
|
||||
changed_when: false
|
||||
|
||||
- name: Apply NOBLE_VELERO_S3_URL from .env
|
||||
ansible.builtin.set_fact:
|
||||
noble_velero_s3_url: "{{ noble_velero_s3_url_from_env.stdout | trim }}"
|
||||
when:
|
||||
- noble_velero_s3_url_from_env is defined
|
||||
- (noble_velero_s3_url_from_env.stdout | default('') | trim | length) > 0
|
||||
|
||||
- name: Create velero-cloud-credentials from .env when keys present
|
||||
ansible.builtin.shell: |
|
||||
set -euo pipefail
|
||||
set -a
|
||||
. "{{ noble_repo_root }}/.env"
|
||||
set +a
|
||||
if [ -z "${NOBLE_VELERO_AWS_ACCESS_KEY_ID:-}" ] || [ -z "${NOBLE_VELERO_AWS_SECRET_ACCESS_KEY:-}" ]; then
|
||||
echo SKIP
|
||||
exit 0
|
||||
fi
|
||||
CLOUD="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' \
|
||||
"${NOBLE_VELERO_AWS_ACCESS_KEY_ID}" "${NOBLE_VELERO_AWS_SECRET_ACCESS_KEY}")"
|
||||
kubectl -n velero create secret generic velero-cloud-credentials \
|
||||
--from-literal=cloud="${CLOUD}" \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
echo APPLIED
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
when: noble_deploy_env_file.stat.exists | default(false)
|
||||
no_log: true
|
||||
register: noble_velero_secret_from_env
|
||||
changed_when: "'APPLIED' in (noble_velero_secret_from_env.stdout | default(''))"
|
||||
85
ansible/roles/noble_velero/tasks/main.yml
Normal file
85
ansible/roles/noble_velero/tasks/main.yml
Normal file
@@ -0,0 +1,85 @@
|
||||
---
|
||||
# Velero — S3 backup target + built-in CSI snapshots (Longhorn: label VolumeSnapshotClass per README).
|
||||
- name: Apply velero namespace
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
- apply
|
||||
- -f
|
||||
- "{{ noble_repo_root }}/clusters/noble/bootstrap/velero/namespace.yaml"
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
when: noble_velero_install | default(false) | bool
|
||||
changed_when: true
|
||||
|
||||
- name: Include Velero settings from repository .env (S3 bucket, URL, credentials)
|
||||
ansible.builtin.include_tasks: from_env.yml
|
||||
when: noble_velero_install | default(false) | bool
|
||||
|
||||
- name: Require S3 bucket and endpoint for Velero
|
||||
ansible.builtin.assert:
|
||||
that:
|
||||
- noble_velero_s3_bucket | default('') | length > 0
|
||||
- noble_velero_s3_url | default('') | length > 0
|
||||
fail_msg: >-
|
||||
Set NOBLE_VELERO_S3_BUCKET and NOBLE_VELERO_S3_URL in .env, or noble_velero_s3_bucket / noble_velero_s3_url
|
||||
(e.g. -e ...), or group_vars when noble_velero_install is true.
|
||||
when: noble_velero_install | default(false) | bool
|
||||
|
||||
- name: Create velero-cloud-credentials from Ansible vars
|
||||
ansible.builtin.shell: |
|
||||
set -euo pipefail
|
||||
CLOUD="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' \
|
||||
"${AWS_ACCESS_KEY_ID}" "${AWS_SECRET_ACCESS_KEY}")"
|
||||
kubectl -n velero create secret generic velero-cloud-credentials \
|
||||
--from-literal=cloud="${CLOUD}" \
|
||||
--dry-run=client -o yaml | kubectl apply -f -
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
AWS_ACCESS_KEY_ID: "{{ noble_velero_aws_access_key_id }}"
|
||||
AWS_SECRET_ACCESS_KEY: "{{ noble_velero_aws_secret_access_key }}"
|
||||
when:
|
||||
- noble_velero_install | default(false) | bool
|
||||
- noble_velero_aws_access_key_id | default('') | length > 0
|
||||
- noble_velero_aws_secret_access_key | default('') | length > 0
|
||||
no_log: true
|
||||
changed_when: true
|
||||
|
||||
- name: Check velero-cloud-credentials Secret
|
||||
ansible.builtin.command:
|
||||
argv:
|
||||
- kubectl
|
||||
- -n
|
||||
- velero
|
||||
- get
|
||||
- secret
|
||||
- velero-cloud-credentials
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
register: noble_velero_secret_check
|
||||
failed_when: false
|
||||
changed_when: false
|
||||
when: noble_velero_install | default(false) | bool
|
||||
|
||||
- name: Require velero-cloud-credentials before Helm
|
||||
ansible.builtin.assert:
|
||||
that:
|
||||
- noble_velero_secret_check.rc == 0
|
||||
fail_msg: >-
|
||||
Velero needs Secret velero/velero-cloud-credentials (key cloud). Set NOBLE_VELERO_AWS_ACCESS_KEY_ID and
|
||||
NOBLE_VELERO_AWS_SECRET_ACCESS_KEY in .env, or noble_velero_aws_* extra-vars, or create the Secret manually
|
||||
(see clusters/noble/bootstrap/velero/README.md).
|
||||
when: noble_velero_install | default(false) | bool
|
||||
|
||||
- name: Optional object prefix argv for Helm
|
||||
ansible.builtin.set_fact:
|
||||
noble_velero_helm_prefix_argv: "{{ ['--set-string', 'configuration.backupStorageLocation[0].prefix=' ~ (noble_velero_s3_prefix | default(''))] if (noble_velero_s3_prefix | default('') | length > 0) else [] }}"
|
||||
when: noble_velero_install | default(false) | bool
|
||||
|
||||
- name: Install Velero
|
||||
ansible.builtin.command:
|
||||
argv: "{{ ['helm', 'upgrade', '--install', 'velero', 'vmware-tanzu/velero', '--namespace', 'velero', '--version', noble_velero_chart_version, '-f', noble_repo_root ~ '/clusters/noble/bootstrap/velero/values.yaml', '--set-string', 'configuration.backupStorageLocation[0].bucket=' ~ noble_velero_s3_bucket, '--set-string', 'configuration.backupStorageLocation[0].config.s3Url=' ~ noble_velero_s3_url, '--set-string', 'configuration.backupStorageLocation[0].config.region=' ~ noble_velero_s3_region, '--set-string', 'configuration.backupStorageLocation[0].config.s3ForcePathStyle=' ~ noble_velero_s3_force_path_style] + (noble_velero_helm_prefix_argv | default([])) + ['--wait'] }}"
|
||||
environment:
|
||||
KUBECONFIG: "{{ noble_kubeconfig }}"
|
||||
when: noble_velero_install | default(false) | bool
|
||||
changed_when: true
|
||||
14
ansible/roles/proxmox_baseline/defaults/main.yml
Normal file
14
ansible/roles/proxmox_baseline/defaults/main.yml
Normal file
@@ -0,0 +1,14 @@
|
||||
---
|
||||
proxmox_repo_debian_codename: "{{ ansible_facts['distribution_release'] | default('bookworm') }}"
|
||||
proxmox_repo_disable_enterprise: true
|
||||
proxmox_repo_disable_ceph_enterprise: true
|
||||
proxmox_repo_enable_pve_no_subscription: true
|
||||
proxmox_repo_enable_ceph_no_subscription: false
|
||||
|
||||
proxmox_no_subscription_notice_disable: true
|
||||
proxmox_widget_toolkit_file: /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
|
||||
|
||||
# Bootstrap root SSH keys from the control machine so subsequent runs can use key auth.
|
||||
proxmox_root_authorized_key_files:
|
||||
- "{{ lookup('env', 'HOME') }}/.ssh/id_ed25519.pub"
|
||||
- "{{ lookup('env', 'HOME') }}/.ssh/ansible.pub"
|
||||
5
ansible/roles/proxmox_baseline/handlers/main.yml
Normal file
5
ansible/roles/proxmox_baseline/handlers/main.yml
Normal file
@@ -0,0 +1,5 @@
|
||||
---
|
||||
- name: Restart pveproxy
|
||||
ansible.builtin.service:
|
||||
name: pveproxy
|
||||
state: restarted
|
||||
100
ansible/roles/proxmox_baseline/tasks/main.yml
Normal file
100
ansible/roles/proxmox_baseline/tasks/main.yml
Normal file
@@ -0,0 +1,100 @@
|
||||
---
|
||||
- name: Check configured local public key files
|
||||
ansible.builtin.stat:
|
||||
path: "{{ item }}"
|
||||
register: proxmox_root_pubkey_stats
|
||||
loop: "{{ proxmox_root_authorized_key_files }}"
|
||||
delegate_to: localhost
|
||||
become: false
|
||||
|
||||
- name: Fail when a configured local public key file is missing
|
||||
ansible.builtin.fail:
|
||||
msg: "Configured key file does not exist on the control host: {{ item.item }}"
|
||||
when: not item.stat.exists
|
||||
loop: "{{ proxmox_root_pubkey_stats.results }}"
|
||||
delegate_to: localhost
|
||||
become: false
|
||||
|
||||
- name: Ensure root authorized_keys contains configured public keys
|
||||
ansible.posix.authorized_key:
|
||||
user: root
|
||||
state: present
|
||||
key: "{{ lookup('ansible.builtin.file', item) }}"
|
||||
manage_dir: true
|
||||
loop: "{{ proxmox_root_authorized_key_files }}"
|
||||
|
||||
- name: Remove enterprise repository lines from /etc/apt/sources.list
|
||||
ansible.builtin.lineinfile:
|
||||
path: /etc/apt/sources.list
|
||||
regexp: ".*enterprise\\.proxmox\\.com.*"
|
||||
state: absent
|
||||
when:
|
||||
- proxmox_repo_disable_enterprise | bool or proxmox_repo_disable_ceph_enterprise | bool
|
||||
failed_when: false
|
||||
|
||||
- name: Find apt source files that contain Proxmox enterprise repositories
|
||||
ansible.builtin.find:
|
||||
paths: /etc/apt/sources.list.d
|
||||
file_type: file
|
||||
patterns:
|
||||
- "*.list"
|
||||
- "*.sources"
|
||||
contains: "enterprise\\.proxmox\\.com"
|
||||
use_regex: true
|
||||
register: proxmox_enterprise_repo_files
|
||||
when:
|
||||
- proxmox_repo_disable_enterprise | bool or proxmox_repo_disable_ceph_enterprise | bool
|
||||
|
||||
- name: Remove enterprise repository lines from apt source files
|
||||
ansible.builtin.lineinfile:
|
||||
path: "{{ item.path }}"
|
||||
regexp: ".*enterprise\\.proxmox\\.com.*"
|
||||
state: absent
|
||||
loop: "{{ proxmox_enterprise_repo_files.files | default([]) }}"
|
||||
when:
|
||||
- proxmox_repo_disable_enterprise | bool or proxmox_repo_disable_ceph_enterprise | bool
|
||||
|
||||
- name: Find apt source files that already contain pve-no-subscription
|
||||
ansible.builtin.find:
|
||||
paths: /etc/apt/sources.list.d
|
||||
file_type: file
|
||||
patterns:
|
||||
- "*.list"
|
||||
- "*.sources"
|
||||
contains: "pve-no-subscription"
|
||||
use_regex: false
|
||||
register: proxmox_no_sub_repo_files
|
||||
when: proxmox_repo_enable_pve_no_subscription | bool
|
||||
|
||||
- name: Ensure Proxmox no-subscription repository is configured when absent
|
||||
ansible.builtin.copy:
|
||||
dest: /etc/apt/sources.list.d/pve-no-subscription.list
|
||||
content: "deb http://download.proxmox.com/debian/pve {{ proxmox_repo_debian_codename }} pve-no-subscription\n"
|
||||
mode: "0644"
|
||||
when:
|
||||
- proxmox_repo_enable_pve_no_subscription | bool
|
||||
- (proxmox_no_sub_repo_files.matched | default(0) | int) == 0
|
||||
|
||||
- name: Remove duplicate pve-no-subscription.list when another source already provides it
|
||||
ansible.builtin.file:
|
||||
path: /etc/apt/sources.list.d/pve-no-subscription.list
|
||||
state: absent
|
||||
when:
|
||||
- proxmox_repo_enable_pve_no_subscription | bool
|
||||
- (proxmox_no_sub_repo_files.files | default([]) | map(attribute='path') | list | select('ne', '/etc/apt/sources.list.d/pve-no-subscription.list') | list | length) > 0
|
||||
|
||||
- name: Ensure Ceph no-subscription repository is configured
|
||||
ansible.builtin.copy:
|
||||
dest: /etc/apt/sources.list.d/ceph-no-subscription.list
|
||||
content: "deb http://download.proxmox.com/debian/ceph-{{ proxmox_repo_debian_codename }} {{ proxmox_repo_debian_codename }} no-subscription\n"
|
||||
mode: "0644"
|
||||
when: proxmox_repo_enable_ceph_no_subscription | bool
|
||||
|
||||
- name: Disable no-subscription pop-up in Proxmox UI
|
||||
ansible.builtin.replace:
|
||||
path: "{{ proxmox_widget_toolkit_file }}"
|
||||
regexp: "if \\(data\\.status !== 'Active'\\)"
|
||||
replace: "if (false)"
|
||||
backup: true
|
||||
when: proxmox_no_subscription_notice_disable | bool
|
||||
notify: Restart pveproxy
|
||||
7
ansible/roles/proxmox_cluster/defaults/main.yml
Normal file
7
ansible/roles/proxmox_cluster/defaults/main.yml
Normal file
@@ -0,0 +1,7 @@
|
||||
---
|
||||
proxmox_cluster_enabled: true
|
||||
proxmox_cluster_name: pve-cluster
|
||||
proxmox_cluster_master: ""
|
||||
proxmox_cluster_master_ip: ""
|
||||
proxmox_cluster_force: false
|
||||
proxmox_cluster_master_root_password: ""
|
||||
63
ansible/roles/proxmox_cluster/tasks/main.yml
Normal file
63
ansible/roles/proxmox_cluster/tasks/main.yml
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
- name: Skip cluster role when disabled
|
||||
ansible.builtin.meta: end_host
|
||||
when: not (proxmox_cluster_enabled | bool)
|
||||
|
||||
- name: Check whether corosync cluster config exists
|
||||
ansible.builtin.stat:
|
||||
path: /etc/pve/corosync.conf
|
||||
register: proxmox_cluster_corosync_conf
|
||||
|
||||
- name: Set effective Proxmox cluster master
|
||||
ansible.builtin.set_fact:
|
||||
proxmox_cluster_master_effective: "{{ proxmox_cluster_master | default(groups['proxmox_hosts'][0], true) }}"
|
||||
|
||||
- name: Set effective Proxmox cluster master IP
|
||||
ansible.builtin.set_fact:
|
||||
proxmox_cluster_master_ip_effective: >-
|
||||
{{
|
||||
proxmox_cluster_master_ip
|
||||
| default(hostvars[proxmox_cluster_master_effective].ansible_host
|
||||
| default(proxmox_cluster_master_effective), true)
|
||||
}}
|
||||
|
||||
- name: Create cluster on designated master
|
||||
ansible.builtin.command:
|
||||
cmd: "pvecm create {{ proxmox_cluster_name }}"
|
||||
when:
|
||||
- inventory_hostname == proxmox_cluster_master_effective
|
||||
- not proxmox_cluster_corosync_conf.stat.exists
|
||||
|
||||
- name: Ensure python3-pexpect is installed for password-based cluster join
|
||||
ansible.builtin.apt:
|
||||
name: python3-pexpect
|
||||
state: present
|
||||
update_cache: true
|
||||
when:
|
||||
- inventory_hostname != proxmox_cluster_master_effective
|
||||
- not proxmox_cluster_corosync_conf.stat.exists
|
||||
- proxmox_cluster_master_root_password | length > 0
|
||||
|
||||
- name: Join node to existing cluster (password provided)
|
||||
ansible.builtin.expect:
|
||||
command: >-
|
||||
pvecm add {{ proxmox_cluster_master_ip_effective }}
|
||||
{% if proxmox_cluster_force | bool %}--force{% endif %}
|
||||
responses:
|
||||
"Please enter superuser \\(root\\) password for '.*':": "{{ proxmox_cluster_master_root_password }}"
|
||||
"password:": "{{ proxmox_cluster_master_root_password }}"
|
||||
no_log: true
|
||||
when:
|
||||
- inventory_hostname != proxmox_cluster_master_effective
|
||||
- not proxmox_cluster_corosync_conf.stat.exists
|
||||
- proxmox_cluster_master_root_password | length > 0
|
||||
|
||||
- name: Join node to existing cluster (SSH trust/no prompt)
|
||||
ansible.builtin.command:
|
||||
cmd: >-
|
||||
pvecm add {{ proxmox_cluster_master_ip_effective }}
|
||||
{% if proxmox_cluster_force | bool %}--force{% endif %}
|
||||
when:
|
||||
- inventory_hostname != proxmox_cluster_master_effective
|
||||
- not proxmox_cluster_corosync_conf.stat.exists
|
||||
- proxmox_cluster_master_root_password | length == 0
|
||||
6
ansible/roles/proxmox_maintenance/defaults/main.yml
Normal file
6
ansible/roles/proxmox_maintenance/defaults/main.yml
Normal file
@@ -0,0 +1,6 @@
|
||||
---
|
||||
proxmox_upgrade_apt_cache_valid_time: 3600
|
||||
proxmox_upgrade_autoremove: true
|
||||
proxmox_upgrade_autoclean: true
|
||||
proxmox_upgrade_reboot_if_required: true
|
||||
proxmox_upgrade_reboot_timeout: 1800
|
||||
30
ansible/roles/proxmox_maintenance/tasks/main.yml
Normal file
30
ansible/roles/proxmox_maintenance/tasks/main.yml
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
- name: Refresh apt cache
|
||||
ansible.builtin.apt:
|
||||
update_cache: true
|
||||
cache_valid_time: "{{ proxmox_upgrade_apt_cache_valid_time }}"
|
||||
|
||||
- name: Upgrade Proxmox host packages
|
||||
ansible.builtin.apt:
|
||||
upgrade: dist
|
||||
|
||||
- name: Remove orphaned packages
|
||||
ansible.builtin.apt:
|
||||
autoremove: "{{ proxmox_upgrade_autoremove }}"
|
||||
|
||||
- name: Clean apt package cache
|
||||
ansible.builtin.apt:
|
||||
autoclean: "{{ proxmox_upgrade_autoclean }}"
|
||||
|
||||
- name: Check if reboot is required
|
||||
ansible.builtin.stat:
|
||||
path: /var/run/reboot-required
|
||||
register: proxmox_reboot_required_file
|
||||
|
||||
- name: Reboot when required by package upgrades
|
||||
ansible.builtin.reboot:
|
||||
reboot_timeout: "{{ proxmox_upgrade_reboot_timeout }}"
|
||||
msg: "Reboot initiated by Ansible Proxmox maintenance playbook"
|
||||
when:
|
||||
- proxmox_upgrade_reboot_if_required | bool
|
||||
- proxmox_reboot_required_file.stat.exists | default(false)
|
||||
BIN
branding/nikflix/logo.png
Normal file
BIN
branding/nikflix/logo.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 277 KiB |
@@ -1,7 +1,7 @@
|
||||
# Argo CD — optional applications (non-bootstrap)
|
||||
|
||||
**Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, Vault, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here.
|
||||
**Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, SOPS secrets path, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here.
|
||||
|
||||
**`noble-root`** (`clusters/noble/bootstrap/argocd/root-application.yaml`) points at **`clusters/noble/apps`**. Add **`Application`** manifests (and optional **`AppProject`** definitions) under this directory only for workloads that are additive and do not subsume the Ansible-managed platform.
|
||||
**`noble-root`** (`clusters/noble/bootstrap/argocd/root-application.yaml`) points at **`clusters/noble/apps`**. Add **`Application`** manifests (and optional **`AppProject`** definitions) under this directory only for workloads that are additive and do not subsume the core platform.
|
||||
|
||||
For an app-of-apps pattern, use a second-level **`Application`** that syncs a subdirectory (for example **`optional/`**) containing leaf **`Application`** resources.
|
||||
Bootstrap kustomize (namespaces, static YAML, leaf **`Application`**s) lives in **`clusters/noble/bootstrap/`** and is tracked by **`noble-bootstrap-root`** — enable automated sync for that app only after **`noble.yml`** completes (**`clusters/noble/bootstrap/argocd/README.md`** §5). Put Helm **`Application`** migrations under **`clusters/noble/bootstrap/argocd/app-of-apps/`**.
|
||||
|
||||
32
clusters/noble/apps/homepage/application.yaml
Normal file
32
clusters/noble/apps/homepage/application.yaml
Normal file
@@ -0,0 +1,32 @@
|
||||
# Argo CD — optional [Homepage](https://gethomepage.dev/) dashboard (Helm from [jameswynn.github.io/helm-charts](https://jameswynn.github.io/helm-charts/)).
|
||||
# Values: **`./values.yaml`** (multi-source **`$values`** ref).
|
||||
#
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: homepage
|
||||
namespace: argocd
|
||||
finalizers:
|
||||
- resources-finalizer.argocd.argoproj.io/background
|
||||
spec:
|
||||
project: default
|
||||
sources:
|
||||
- repoURL: https://jameswynn.github.io/helm-charts
|
||||
chart: homepage
|
||||
targetRevision: 2.1.0
|
||||
helm:
|
||||
releaseName: homepage
|
||||
valueFiles:
|
||||
- $values/clusters/noble/apps/homepage/values.yaml
|
||||
- repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
|
||||
targetRevision: HEAD
|
||||
ref: values
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: homepage
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
122
clusters/noble/apps/homepage/values.yaml
Normal file
122
clusters/noble/apps/homepage/values.yaml
Normal file
@@ -0,0 +1,122 @@
|
||||
# Homepage — [gethomepage/homepage](https://github.com/gethomepage/homepage) via [jameswynn/homepage](https://github.com/jameswynn/helm-charts) Helm chart.
|
||||
# Ingress: Traefik + cert-manager (same pattern as `clusters/noble/bootstrap/headlamp/values.yaml`).
|
||||
# Service links match **`ansible/roles/noble_landing_urls/defaults/main.yml`** (`noble_lab_ui_entries`).
|
||||
# **Velero** has no in-cluster web UI — tile links to upstream docs (no **siteMonitor**).
|
||||
#
|
||||
# **`siteMonitor`** runs **server-side** in the Homepage pod (see `gethomepage/homepage` `siteMonitor.js`).
|
||||
# Public FQDNs like **`*.apps.noble.lab.pcenicni.dev`** often do **not** resolve inside the cluster
|
||||
# (split-horizon / LAN DNS only) → `ENOTFOUND` / HTTP **500** in the monitor. Use **in-cluster Service**
|
||||
# URLs for **`siteMonitor`** only; **`href`** stays the human-facing ingress URL.
|
||||
#
|
||||
# **Prometheus widget** also resolves from the pod — use the real **Service** name (Helm may truncate to
|
||||
# 63 chars — this repo’s generated UI list uses **`kube-prometheus-kube-prome-prometheus`**).
|
||||
# Verify: `kubectl -n monitoring get svc | grep -E 'prometheus|alertmanager|grafana'`.
|
||||
#
|
||||
image:
|
||||
repository: ghcr.io/gethomepage/homepage
|
||||
tag: v1.2.0
|
||||
|
||||
enableRbac: true
|
||||
|
||||
serviceAccount:
|
||||
create: true
|
||||
|
||||
ingress:
|
||||
main:
|
||||
enabled: true
|
||||
ingressClassName: traefik
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
hosts:
|
||||
- host: homepage.apps.noble.lab.pcenicni.dev
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- hosts:
|
||||
- homepage.apps.noble.lab.pcenicni.dev
|
||||
secretName: homepage-apps-noble-tls
|
||||
|
||||
env:
|
||||
- name: HOMEPAGE_ALLOWED_HOSTS
|
||||
value: homepage.apps.noble.lab.pcenicni.dev
|
||||
|
||||
config:
|
||||
bookmarks: []
|
||||
services:
|
||||
- Noble Lab:
|
||||
- Argo CD:
|
||||
icon: si-argocd
|
||||
href: https://argo.apps.noble.lab.pcenicni.dev
|
||||
siteMonitor: http://argocd-server.argocd.svc.cluster.local:80
|
||||
description: GitOps UI (sync, apps, repos)
|
||||
- Grafana:
|
||||
icon: si-grafana
|
||||
href: https://grafana.apps.noble.lab.pcenicni.dev
|
||||
siteMonitor: http://kube-prometheus-grafana.monitoring.svc.cluster.local:80
|
||||
description: Dashboards, Loki explore (logs)
|
||||
- Prometheus:
|
||||
icon: si-prometheus
|
||||
href: https://prometheus.apps.noble.lab.pcenicni.dev
|
||||
siteMonitor: http://kube-prometheus-kube-prome-prometheus.monitoring.svc.cluster.local:9090
|
||||
description: Prometheus UI (queries, targets) — lab; protect in production
|
||||
widget:
|
||||
type: prometheus
|
||||
url: http://kube-prometheus-kube-prome-prometheus.monitoring.svc.cluster.local:9090
|
||||
fields: ["targets_up", "targets_down", "targets_total"]
|
||||
- Alertmanager:
|
||||
icon: alertmanager.png
|
||||
href: https://alertmanager.apps.noble.lab.pcenicni.dev
|
||||
siteMonitor: http://kube-prometheus-kube-prome-alertmanager.monitoring.svc.cluster.local:9093
|
||||
description: Alertmanager UI (silences, status)
|
||||
- Headlamp:
|
||||
icon: mdi-kubernetes
|
||||
href: https://headlamp.apps.noble.lab.pcenicni.dev
|
||||
siteMonitor: http://headlamp.headlamp.svc.cluster.local:80
|
||||
description: Kubernetes UI (cluster resources)
|
||||
- Longhorn:
|
||||
icon: longhorn.png
|
||||
href: https://longhorn.apps.noble.lab.pcenicni.dev
|
||||
siteMonitor: http://longhorn-frontend.longhorn-system.svc.cluster.local:80
|
||||
description: Storage volumes, nodes, backups
|
||||
- Velero:
|
||||
icon: mdi-backup-restore
|
||||
href: https://velero.io/docs/
|
||||
description: Cluster backups — no in-cluster web UI; use velero CLI or kubectl (docs)
|
||||
widgets:
|
||||
- datetime:
|
||||
text_size: xl
|
||||
format:
|
||||
dateStyle: medium
|
||||
timeStyle: short
|
||||
- kubernetes:
|
||||
cluster:
|
||||
show: true
|
||||
cpu: true
|
||||
memory: true
|
||||
showLabel: true
|
||||
label: Cluster
|
||||
nodes:
|
||||
show: true
|
||||
cpu: true
|
||||
memory: true
|
||||
showLabel: true
|
||||
- search:
|
||||
provider: duckduckgo
|
||||
target: _blank
|
||||
kubernetes:
|
||||
mode: cluster
|
||||
settingsString: |
|
||||
title: Noble Lab
|
||||
description: Homelab services — in-cluster uptime checks, cluster resources, Prometheus targets
|
||||
theme: dark
|
||||
color: slate
|
||||
headerStyle: boxedWidgets
|
||||
statusStyle: dot
|
||||
iconStyle: theme
|
||||
fullWidth: true
|
||||
useEqualHeights: true
|
||||
layout:
|
||||
Noble Lab:
|
||||
style: row
|
||||
columns: 4
|
||||
@@ -3,4 +3,5 @@
|
||||
# Helm value files for those apps can live in subdirectories here (for example **./homepage/values.yaml**).
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources: []
|
||||
resources:
|
||||
- homepage/application.yaml
|
||||
|
||||
@@ -50,21 +50,56 @@ helm upgrade --install argocd argo/argo-cd -n argocd --create-namespace \
|
||||
|
||||
Use **Settings → Repositories** in the UI, or `argocd repo add` / a `Secret` of type `repository`.
|
||||
|
||||
## 4. App-of-apps (optional GitOps only)
|
||||
## 4. App-of-apps (GitOps)
|
||||
|
||||
Bootstrap **platform** workloads (CNI, ingress, cert-manager, Kyverno, observability, Vault, etc.) are installed by
|
||||
**`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not by Argo. **`clusters/noble/apps/kustomization.yaml`** is empty by default.
|
||||
**Ansible** (`ansible/playbooks/noble.yml`) performs the **initial** install: Helm releases and **`kubectl apply -k clusters/noble/bootstrap`**. **Argo** then tracks the same git paths for ongoing reconciliation.
|
||||
|
||||
1. Edit **`root-application.yaml`**: set **`repoURL`** and **`targetRevision`** to this repository. The **`resources-finalizer.argocd.argoproj.io/background`** finalizer uses Argo’s path-qualified form so **`kubectl apply`** does not warn about finalizer names.
|
||||
2. When you want Argo to manage specific apps, add **`Application`** manifests under **`clusters/noble/apps/`** (see **`clusters/noble/apps/README.md`**).
|
||||
3. Apply the root:
|
||||
1. Edit **`root-application.yaml`** and **`bootstrap-root-application.yaml`**: set **`repoURL`** and **`targetRevision`**. The **`resources-finalizer.argocd.argoproj.io/background`** finalizer uses Argo’s path-qualified form so **`kubectl apply`** does not warn about finalizer names.
|
||||
2. Optional add-on apps: add **`Application`** manifests under **`clusters/noble/apps/`** (see **`clusters/noble/apps/README.md`**).
|
||||
3. **Bootstrap kustomize** (namespaces, datasource, leaf **`Application`**s under **`argocd/app-of-apps/`**, etc.): **`noble-bootstrap-root`** syncs **`clusters/noble/bootstrap`**. It is created with **manual** sync only so Argo does not apply changes while **`noble.yml`** is still running.
|
||||
|
||||
**`ansible/playbooks/noble.yml`** (role **`noble_argocd`**) applies both roots when **`noble_argocd_apply_root_application`** / **`noble_argocd_apply_bootstrap_root_application`** are true in **`ansible/group_vars/all.yml`**.
|
||||
|
||||
```bash
|
||||
kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
|
||||
kubectl apply -f clusters/noble/bootstrap/argocd/bootstrap-root-application.yaml
|
||||
```
|
||||
|
||||
If you migrated from GitOps-managed **`noble-platform`** / **`noble-kyverno`**, delete stale **`Application`** objects on
|
||||
the cluster (see **`clusters/noble/apps/README.md`**) then re-apply the root.
|
||||
If you migrated from older GitOps **`Application`** names, delete stale **`Application`** objects on the cluster (see **`clusters/noble/apps/README.md`**) then re-apply the roots.
|
||||
|
||||
## 5. After Ansible: enable automated sync for **noble-bootstrap-root**
|
||||
|
||||
Do this only after **`ansible-playbook playbooks/noble.yml`** has finished successfully (including **`noble_platform`** `kubectl apply -k` and any Helm stages you rely on). Until then, leave **manual** sync so Argo does not fight the playbook.
|
||||
|
||||
**Required steps**
|
||||
|
||||
1. Confirm the cluster matches git for kustomize output (optional): `kubectl kustomize clusters/noble/bootstrap | kubectl diff -f -` or inspect resources in the UI.
|
||||
2. Register the git repo in Argo if you have not already (**§3**).
|
||||
3. **Refresh** the app so Argo compares **`clusters/noble/bootstrap`** to the cluster: Argo UI → **noble-bootstrap-root** → **Refresh**, or:
|
||||
|
||||
```bash
|
||||
argocd app get noble-bootstrap-root --refresh
|
||||
```
|
||||
|
||||
4. **Enable automated sync** (prune + self-heal), preserving **`CreateNamespace`**, using any one of:
|
||||
|
||||
**kubectl**
|
||||
|
||||
```bash
|
||||
kubectl patch application noble-bootstrap-root -n argocd --type merge -p '{"spec":{"syncPolicy":{"automated":{"prune":true,"selfHeal":true},"syncOptions":["CreateNamespace=true"]}}}'
|
||||
```
|
||||
|
||||
**argocd** CLI (logged in)
|
||||
|
||||
```bash
|
||||
argocd app set noble-bootstrap-root --sync-policy automated --auto-prune --self-heal
|
||||
```
|
||||
|
||||
**UI:** open **noble-bootstrap-root** → **App Details** → enable **AUTO-SYNC** (and **Prune** / **Self Heal** if shown).
|
||||
|
||||
5. Trigger a sync if the app does not go green immediately: **Sync** in the UI, or `argocd app sync noble-bootstrap-root`.
|
||||
|
||||
After this, **git** is the source of truth for everything under **`clusters/noble/bootstrap/kustomization.yaml`** (including **`argocd/app-of-apps/`**). Helm-managed platform components remain whatever Ansible last installed until you model them as Argo **`Application`**s under **`app-of-apps/`** and stop installing them from Ansible.
|
||||
|
||||
## Versions
|
||||
|
||||
|
||||
@@ -3,8 +3,10 @@
|
||||
# 1. Set spec.source.repoURL (and targetRevision — **HEAD** tracks the remote default branch) to this repo.
|
||||
# 2. kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
|
||||
#
|
||||
# **clusters/noble/apps** holds optional **Application** manifests. Core platform is installed by
|
||||
# **ansible/playbooks/noble.yml** from **clusters/noble/bootstrap/**.
|
||||
# **clusters/noble/apps** holds optional **Application** manifests. Core platform Helm + kustomize is
|
||||
# installed by **ansible/playbooks/noble.yml** from **clusters/noble/bootstrap/**. **bootstrap-root-application.yaml**
|
||||
# registers **noble-bootstrap-root** for the same kustomize tree (**manual** sync until you enable
|
||||
# automation after the playbook — see **README.md** §5).
|
||||
#
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
|
||||
16
clusters/noble/bootstrap/csi-snapshot-controller/README.md
Normal file
16
clusters/noble/bootstrap/csi-snapshot-controller/README.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# CSI Volume Snapshot (external-snapshotter)
|
||||
|
||||
Installs the **Volume Snapshot** CRDs and the **snapshot-controller** so CSI drivers (e.g. **Longhorn**) and **Velero** can use `VolumeSnapshot` / `VolumeSnapshotContent` / `VolumeSnapshotClass`.
|
||||
|
||||
- Upstream: [kubernetes-csi/external-snapshotter](https://github.com/kubernetes-csi/external-snapshotter) **v8.5.0**
|
||||
- **Not** the per-driver **csi-snapshotter** sidecar — Longhorn ships that with its CSI components.
|
||||
|
||||
**Order:** apply **before** relying on volume snapshots (e.g. before or early with **Longhorn**; **Ansible** runs this after **Cilium**, before **metrics-server** / **Longhorn**).
|
||||
|
||||
```bash
|
||||
kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/crd
|
||||
kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/controller
|
||||
kubectl -n kube-system rollout status deploy/snapshot-controller --timeout=120s
|
||||
```
|
||||
|
||||
After this, create or label a **VolumeSnapshotClass** for Longhorn (`velero.io/csi-volumesnapshot-class: "true"`) per `clusters/noble/bootstrap/velero/README.md`.
|
||||
@@ -0,0 +1,8 @@
|
||||
# Snapshot controller — **kube-system** (upstream default).
|
||||
# Image tag should match the external-snapshotter release family (see setup-snapshot-controller.yaml in that tag).
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: kube-system
|
||||
resources:
|
||||
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
|
||||
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
|
||||
@@ -0,0 +1,9 @@
|
||||
# kubernetes-csi/external-snapshotter — Volume Snapshot GA CRDs only (no VolumeGroupSnapshot).
|
||||
# Pin **ref** when bumping; keep in sync with **controller** image below.
|
||||
# https://github.com/kubernetes-csi/external-snapshotter/tree/v8.5.0/client/config/crd
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources:
|
||||
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
|
||||
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
|
||||
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml
|
||||
@@ -1,60 +0,0 @@
|
||||
# External Secrets Operator (noble)
|
||||
|
||||
Syncs secrets from external systems into Kubernetes **Secret** objects via **ExternalSecret** / **ClusterExternalSecret** CRDs.
|
||||
|
||||
- **Chart:** `external-secrets/external-secrets` **2.2.0** (app **v2.2.0**)
|
||||
- **Namespace:** `external-secrets`
|
||||
- **Helm release name:** `external-secrets` (matches the operator **ServiceAccount** name `external-secrets`)
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
helm repo add external-secrets https://charts.external-secrets.io
|
||||
helm repo update
|
||||
kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
|
||||
helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
|
||||
--version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
kubectl -n external-secrets get deploy,pods
|
||||
kubectl get crd | grep external-secrets
|
||||
```
|
||||
|
||||
## Vault `ClusterSecretStore` (after Vault is deployed)
|
||||
|
||||
The checklist expects a **Vault**-backed store. Install Vault first (`talos/CLUSTER-BUILD.md` Phase E — Vault on Longhorn + auto-unseal), then:
|
||||
|
||||
1. Enable **KV v2** secrets engine and **Kubernetes** auth in Vault; create a **role** (e.g. `external-secrets`) that maps the cluster’s **`external-secrets` / `external-secrets`** service account to a policy that can read the paths you need.
|
||||
2. Copy **`examples/vault-cluster-secret-store.yaml`**, set **`spec.provider.vault.server`** to your Vault URL. This repo’s Vault Helm values use **HTTP** on port **8200** (`global.tlsDisable: true`): **`http://vault.vault.svc.cluster.local:8200`**. Use **`https://`** if you enable TLS on the Vault listener.
|
||||
3. If Vault uses a **private TLS CA**, configure **`caProvider`** or **`caBundle`** on the Vault provider — see [HashiCorp Vault provider](https://external-secrets.io/latest/provider/hashicorp-vault/). Do not commit private CA material to public git unless intended.
|
||||
4. Apply: **`kubectl apply -f …/vault-cluster-secret-store.yaml`**
|
||||
5. Confirm the store is ready: **`kubectl describe clustersecretstore vault`**
|
||||
|
||||
Example **ExternalSecret** (after the store is healthy):
|
||||
|
||||
```yaml
|
||||
apiVersion: external-secrets.io/v1
|
||||
kind: ExternalSecret
|
||||
metadata:
|
||||
name: demo
|
||||
namespace: default
|
||||
spec:
|
||||
refreshInterval: 1h
|
||||
secretStoreRef:
|
||||
name: vault
|
||||
kind: ClusterSecretStore
|
||||
target:
|
||||
name: demo-synced
|
||||
data:
|
||||
- secretKey: password
|
||||
remoteRef:
|
||||
key: secret/data/myapp
|
||||
property: password
|
||||
```
|
||||
|
||||
## Upgrades
|
||||
|
||||
Pin the chart version in `values.yaml` header comments; run the same **`helm upgrade --install`** with the new **`--version`** after reviewing [release notes](https://github.com/external-secrets/external-secrets/releases).
|
||||
@@ -1,31 +0,0 @@
|
||||
# ClusterSecretStore for HashiCorp Vault (KV v2) using Kubernetes auth.
|
||||
#
|
||||
# Do not apply until Vault is running, reachable from the cluster, and configured with:
|
||||
# - Kubernetes auth at mountPath (default: kubernetes)
|
||||
# - A role (below: external-secrets) bound to this service account:
|
||||
# name: external-secrets
|
||||
# namespace: external-secrets
|
||||
# - A policy allowing read on the KV path used below (e.g. secret/data/* for path "secret")
|
||||
#
|
||||
# Adjust server, mountPath, role, and path to match your Vault deployment. If Vault uses TLS
|
||||
# with a private CA, set provider.vault.caProvider or caBundle (see README).
|
||||
#
|
||||
# kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
|
||||
---
|
||||
apiVersion: external-secrets.io/v1
|
||||
kind: ClusterSecretStore
|
||||
metadata:
|
||||
name: vault
|
||||
spec:
|
||||
provider:
|
||||
vault:
|
||||
server: "http://vault.vault.svc.cluster.local:8200"
|
||||
path: secret
|
||||
version: v2
|
||||
auth:
|
||||
kubernetes:
|
||||
mountPath: kubernetes
|
||||
role: external-secrets
|
||||
serviceAccountRef:
|
||||
name: external-secrets
|
||||
namespace: external-secrets
|
||||
@@ -1,5 +0,0 @@
|
||||
# External Secrets Operator — apply before Helm.
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: external-secrets
|
||||
@@ -1,10 +0,0 @@
|
||||
# External Secrets Operator — noble
|
||||
#
|
||||
# helm repo add external-secrets https://charts.external-secrets.io
|
||||
# helm repo update
|
||||
# kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
|
||||
# helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
|
||||
# --version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
|
||||
#
|
||||
# CRDs are installed by the chart (installCRDs: true). Vault ClusterSecretStore: see README + examples/.
|
||||
commonLabels: {}
|
||||
@@ -1,6 +1,8 @@
|
||||
# Ansible bootstrap: plain Kustomize (namespaces + extra YAML). Helm installs are driven by
|
||||
# **ansible/playbooks/noble.yml** (role **noble_platform**) — avoids **kustomize --enable-helm** in-repo.
|
||||
# Optional GitOps workloads live under **../apps/** (Argo **noble-root**).
|
||||
# Optional GitOps: **../apps/** (Argo **noble-root**); leaf **Application**s under **argocd/app-of-apps/**.
|
||||
# **noble-bootstrap-root** (Argo) uses this same path — enable automated sync only after **noble.yml**
|
||||
# completes (see **argocd/README.md** §5).
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
|
||||
@@ -8,11 +10,10 @@ resources:
|
||||
- kube-prometheus-stack/namespace.yaml
|
||||
- loki/namespace.yaml
|
||||
- fluent-bit/namespace.yaml
|
||||
- sealed-secrets/namespace.yaml
|
||||
- external-secrets/namespace.yaml
|
||||
- vault/namespace.yaml
|
||||
- newt/namespace.yaml
|
||||
- kyverno/namespace.yaml
|
||||
- velero/namespace.yaml
|
||||
- velero/longhorn-volumesnapshotclass.yaml
|
||||
- headlamp/namespace.yaml
|
||||
- grafana-loki-datasource/loki-datasource.yaml
|
||||
- vault/unseal-cronjob.yaml
|
||||
- vault/cilium-network-policy.yaml
|
||||
- argocd/app-of-apps
|
||||
|
||||
@@ -35,7 +35,6 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
|
||||
- kube-node-lease
|
||||
- argocd
|
||||
- cert-manager
|
||||
- external-secrets
|
||||
- headlamp
|
||||
- kyverno
|
||||
- logging
|
||||
@@ -44,9 +43,7 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
|
||||
- metallb-system
|
||||
- monitoring
|
||||
- newt
|
||||
- sealed-secrets
|
||||
- traefik
|
||||
- vault
|
||||
|
||||
policyExclude:
|
||||
disallow-capabilities: *kyverno_exclude_infra
|
||||
|
||||
@@ -2,26 +2,24 @@
|
||||
|
||||
This is the **primary** automation path for **public** hostnames to workloads in this cluster (it **replaces** in-cluster ExternalDNS). [Newt](https://github.com/fosrl/newt) is the on-prem agent that connects your cluster to a **Pangolin** site (WireGuard tunnel). The [Fossorial Helm chart](https://github.com/fosrl/helm-charts) deploys one or more instances.
|
||||
|
||||
**Secrets:** Never commit endpoint, Newt ID, or Newt secret. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret.
|
||||
**Secrets:** Never commit endpoint, Newt ID, or Newt secret in **plain** YAML. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret.
|
||||
|
||||
## 1. Create the Secret
|
||||
|
||||
Keys must match `values.yaml` (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`).
|
||||
|
||||
### Option A — Sealed Secret (safe for GitOps)
|
||||
### Option A — SOPS (safe for GitOps)
|
||||
|
||||
With the [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) controller installed (`clusters/noble/bootstrap/sealed-secrets/`), generate a `SealedSecret` from your workstation (rotate credentials in Pangolin first if they were exposed):
|
||||
Encrypt a normal **`Secret`** with [Mozilla SOPS](https://github.com/getsops/sops) and **age** (see **`clusters/noble/secrets/README.md`** and **`.sops.yaml`**). The repo includes an encrypted example at **`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`** — edit with `sops` after exporting **`SOPS_AGE_KEY_FILE`** to your **`age-key.txt`**, or create a new file and encrypt it.
|
||||
|
||||
```bash
|
||||
chmod +x clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
|
||||
export PANGOLIN_ENDPOINT='https://pangolin.pcenicni.dev'
|
||||
export NEWT_ID='YOUR_NEWT_ID'
|
||||
export NEWT_SECRET='YOUR_NEWT_SECRET'
|
||||
./clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
|
||||
kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
|
||||
export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
|
||||
sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
|
||||
# then:
|
||||
sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
|
||||
```
|
||||
|
||||
Commit only the `.sealedsecret.yaml` file, not plain `Secret` YAML.
|
||||
**Ansible** (`noble.yml`) applies all **`clusters/noble/secrets/*.yaml`** automatically when **`age-key.txt`** exists at the repo root.
|
||||
|
||||
### Option B — Imperative Secret (not in git)
|
||||
|
||||
|
||||
@@ -1,50 +0,0 @@
|
||||
# Sealed Secrets (noble)
|
||||
|
||||
Encrypts `Secret` manifests so they can live in git; the controller decrypts **SealedSecret** resources into **Secret**s in-cluster.
|
||||
|
||||
- **Chart:** `sealed-secrets/sealed-secrets` **2.18.4** (app **0.36.1**)
|
||||
- **Namespace:** `sealed-secrets`
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
|
||||
helm repo update
|
||||
kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
|
||||
helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
|
||||
--version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
|
||||
```
|
||||
|
||||
## Workstation: `kubeseal`
|
||||
|
||||
Install a **kubeseal** build compatible with the controller (match **app** minor, e.g. **0.36.x** for **0.36.1**). Examples:
|
||||
|
||||
- **Homebrew:** `brew install kubeseal` (check `kubeseal --version` against the chart’s `image.tag` in `helm show values`).
|
||||
- **GitHub releases:** [bitnami-labs/sealed-secrets](https://github.com/bitnami-labs/sealed-secrets/releases)
|
||||
|
||||
Fetch the cluster’s public seal cert (once per kube context):
|
||||
|
||||
```bash
|
||||
kubeseal --fetch-cert > /tmp/noble-sealed-secrets.pem
|
||||
```
|
||||
|
||||
Create a sealed secret from a normal secret manifest:
|
||||
|
||||
```bash
|
||||
kubectl create secret generic example --from-literal=foo=bar --dry-run=client -o yaml \
|
||||
| kubeseal --cert /tmp/noble-sealed-secrets.pem -o yaml > example-sealedsecret.yaml
|
||||
```
|
||||
|
||||
Commit `example-sealedsecret.yaml`; apply it with `kubectl apply -f`. The controller creates the **Secret** in the same namespace as the **SealedSecret**.
|
||||
|
||||
**Noble example:** `examples/kubeseal-newt-pangolin-auth.sh` (Newt / Pangolin tunnel credentials).
|
||||
|
||||
## Backup the sealing key
|
||||
|
||||
If the controller’s private key is lost, existing sealed files cannot be decrypted on a new cluster. Back up the key secret after install:
|
||||
|
||||
```bash
|
||||
kubectl get secret -n sealed-secrets -l sealedsecrets.bitnami.com/sealed-secrets-key=active -o yaml > sealed-secrets-key-backup.yaml
|
||||
```
|
||||
|
||||
Store `sealed-secrets-key-backup.yaml` in a safe offline location (not in public git).
|
||||
@@ -1,19 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# Emit a SealedSecret for newt-pangolin-auth (namespace newt).
|
||||
# Prerequisites: sealed-secrets controller running; kubeseal client (same minor as controller).
|
||||
# Rotate Pangolin/Newt credentials in the UI first if they were exposed, then set env vars and run:
|
||||
#
|
||||
# export PANGOLIN_ENDPOINT='https://pangolin.example.com'
|
||||
# export NEWT_ID='...'
|
||||
# export NEWT_SECRET='...'
|
||||
# ./kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
|
||||
# kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
|
||||
#
|
||||
set -euo pipefail
|
||||
kubectl apply -f "$(dirname "$0")/../../newt/namespace.yaml" >/dev/null 2>&1 || true
|
||||
kubectl -n newt create secret generic newt-pangolin-auth \
|
||||
--dry-run=client \
|
||||
--from-literal=PANGOLIN_ENDPOINT="${PANGOLIN_ENDPOINT:?}" \
|
||||
--from-literal=NEWT_ID="${NEWT_ID:?}" \
|
||||
--from-literal=NEWT_SECRET="${NEWT_SECRET:?}" \
|
||||
-o yaml | kubeseal -o yaml
|
||||
@@ -1,5 +0,0 @@
|
||||
# Sealed Secrets controller — apply before Helm.
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: sealed-secrets
|
||||
@@ -1,18 +0,0 @@
|
||||
# Sealed Secrets — noble (Git-encrypted Secret workflow)
|
||||
#
|
||||
# helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
|
||||
# helm repo update
|
||||
# kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
|
||||
# helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
|
||||
# --version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
|
||||
#
|
||||
# Client: install kubeseal (same minor as controller — see README).
|
||||
# Defaults are sufficient for the lab; override here if you need key renewal, resources, etc.
|
||||
#
|
||||
# GitOps pattern: create Secrets only via SealedSecret (or External Secrets + Vault).
|
||||
# Example (Newt): clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
|
||||
# Backup the controller's sealing key: kubectl -n sealed-secrets get secret sealed-secrets-key -o yaml
|
||||
#
|
||||
# Talos cluster secrets (bootstrap token, cluster secret, certs) belong in talhelper talsecret /
|
||||
# SOPS — not Sealed Secrets. See talos/README.md.
|
||||
commonLabels: {}
|
||||
@@ -1,162 +0,0 @@
|
||||
# HashiCorp Vault (noble)
|
||||
|
||||
Standalone Vault with **file** storage on a **Longhorn** PVC (`server.dataStorage`). The listener uses **HTTP** (`global.tlsDisable: true`) for in-cluster use; add TLS at the listener when exposing outside the cluster.
|
||||
|
||||
- **Chart:** `hashicorp/vault` **0.32.0** (Vault **1.21.2**)
|
||||
- **Namespace:** `vault`
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
helm repo add hashicorp https://helm.releases.hashicorp.com
|
||||
helm repo update
|
||||
kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
|
||||
helm upgrade --install vault hashicorp/vault -n vault \
|
||||
--version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
kubectl -n vault get pods,pvc,svc
|
||||
kubectl -n vault exec -i sts/vault -- vault status
|
||||
```
|
||||
|
||||
## Cilium network policy (Phase G)
|
||||
|
||||
After **Cilium** is up, optionally restrict HTTP access to the Vault server pods (**TCP 8200**) to **`external-secrets`** and same-namespace clients:
|
||||
|
||||
```bash
|
||||
kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
|
||||
```
|
||||
|
||||
If you add workloads in other namespaces that call Vault, extend **`ingress`** in that manifest.
|
||||
|
||||
## Initialize and unseal (first time)
|
||||
|
||||
From a workstation with `kubectl` (or `kubectl exec` into any pod with `vault` CLI):
|
||||
|
||||
```bash
|
||||
kubectl -n vault exec -i sts/vault -- vault operator init -key-shares=1 -key-threshold=1
|
||||
```
|
||||
|
||||
**Lab-only:** `-key-shares=1 -key-threshold=1` keeps a single unseal key. For stronger Shamir splits, use more shares and store them safely.
|
||||
|
||||
Save the **Unseal Key** and **Root Token** offline. Then unseal once:
|
||||
|
||||
```bash
|
||||
kubectl -n vault exec -i sts/vault -- vault operator unseal
|
||||
# paste unseal key
|
||||
```
|
||||
|
||||
Or create the Secret used by the optional CronJob and apply it:
|
||||
|
||||
```bash
|
||||
kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
|
||||
kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
|
||||
```
|
||||
|
||||
The CronJob runs every minute and unseals if Vault is sealed and the Secret is present.
|
||||
|
||||
## Auto-unseal note
|
||||
|
||||
Vault **OSS** auto-unseal uses cloud KMS (AWS, GCP, Azure, OCI), **Transit** (another Vault), etc. There is no first-class “Kubernetes Secret” seal. This repo uses an optional **CronJob** as a **lab** substitute. Production clusters should use a supported seal backend.
|
||||
|
||||
## Kubernetes auth (External Secrets / ClusterSecretStore)
|
||||
|
||||
**One-shot:** from the repo root, `export KUBECONFIG=talos/kubeconfig` and `export VAULT_TOKEN=…`, then run **`./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`** (idempotent). Then **`kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** on its own line (shell comments **`# …`** on the same line are parsed as extra `kubectl` args and break `apply`). **`kubectl get clustersecretstore vault`** should show **READY=True** after a few seconds.
|
||||
|
||||
Run these **from your workstation** (needs `kubectl`; no local `vault` binary required). Use a **short-lived admin token** or the root token **only in your shell** — do not paste tokens into logs or chat.
|
||||
|
||||
**1. Enable the auth method** (skip if already done):
|
||||
|
||||
```bash
|
||||
kubectl -n vault exec -it sts/vault -- sh -c '
|
||||
export VAULT_ADDR=http://127.0.0.1:8200
|
||||
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
|
||||
vault auth enable kubernetes
|
||||
'
|
||||
```
|
||||
|
||||
**2. Configure `auth/kubernetes`** — the API **issuer** must match the `iss` claim on service account JWTs. With **kube-vip** / a custom API URL, discover it from the cluster (do not assume `kubernetes.default`):
|
||||
|
||||
```bash
|
||||
ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
|
||||
REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
|
||||
CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
|
||||
```
|
||||
|
||||
Then apply config **inside** the Vault pod (environment variables are passed in with `env` so quoting stays correct):
|
||||
|
||||
```bash
|
||||
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
|
||||
export ISSUER REVIEWER CA_B64
|
||||
kubectl -n vault exec -i sts/vault -- env \
|
||||
VAULT_ADDR=http://127.0.0.1:8200 \
|
||||
VAULT_TOKEN="$VAULT_TOKEN" \
|
||||
CA_B64="$CA_B64" \
|
||||
REVIEWER="$REVIEWER" \
|
||||
ISSUER="$ISSUER" \
|
||||
sh -ec '
|
||||
echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
|
||||
vault write auth/kubernetes/config \
|
||||
kubernetes_host="https://kubernetes.default.svc:443" \
|
||||
kubernetes_ca_cert=@/tmp/k8s-ca.crt \
|
||||
token_reviewer_jwt="$REVIEWER" \
|
||||
issuer="$ISSUER"
|
||||
'
|
||||
```
|
||||
|
||||
**3. KV v2** at path `secret` (skip if already enabled):
|
||||
|
||||
```bash
|
||||
kubectl -n vault exec -it sts/vault -- sh -c '
|
||||
export VAULT_ADDR=http://127.0.0.1:8200
|
||||
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
|
||||
vault secrets enable -path=secret kv-v2
|
||||
'
|
||||
```
|
||||
|
||||
**4. Policy + role** for the External Secrets operator SA (`external-secrets` / `external-secrets`):
|
||||
|
||||
```bash
|
||||
kubectl -n vault exec -it sts/vault -- sh -c '
|
||||
export VAULT_ADDR=http://127.0.0.1:8200
|
||||
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
|
||||
vault policy write external-secrets - <<EOF
|
||||
path "secret/data/*" {
|
||||
capabilities = ["read", "list"]
|
||||
}
|
||||
path "secret/metadata/*" {
|
||||
capabilities = ["read", "list"]
|
||||
}
|
||||
EOF
|
||||
vault write auth/kubernetes/role/external-secrets \
|
||||
bound_service_account_names=external-secrets \
|
||||
bound_service_account_namespaces=external-secrets \
|
||||
policies=external-secrets \
|
||||
ttl=24h
|
||||
'
|
||||
```
|
||||
|
||||
**5. Apply** **`clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** if you have not already, then verify:
|
||||
|
||||
```bash
|
||||
kubectl describe clustersecretstore vault
|
||||
```
|
||||
|
||||
See also [Kubernetes auth](https://developer.hashicorp.com/vault/docs/auth/kubernetes#configuration).
|
||||
|
||||
## TLS and External Secrets
|
||||
|
||||
`values.yaml` disables TLS on the Vault listener. The **`ClusterSecretStore`** example uses **`http://vault.vault.svc.cluster.local:8200`**. If you enable TLS on the listener, switch the URL to **`https://`** and configure **`caBundle`** or **`caProvider`** on the store.
|
||||
|
||||
## UI
|
||||
|
||||
Port-forward:
|
||||
|
||||
```bash
|
||||
kubectl -n vault port-forward svc/vault-ui 8200:8200
|
||||
```
|
||||
|
||||
Open `http://127.0.0.1:8200` and log in with the root token (rotate for production workflows).
|
||||
@@ -1,40 +0,0 @@
|
||||
# CiliumNetworkPolicy — restrict who may reach Vault HTTP listener (8200).
|
||||
# Apply after Cilium is healthy: kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
|
||||
#
|
||||
# Ingress-only policy: egress from Vault is unchanged (Kubernetes auth needs API + DNS).
|
||||
# Extend ingress rules if other namespaces must call Vault (e.g. app workloads).
|
||||
#
|
||||
# Ref: https://docs.cilium.io/en/stable/security/policy/language/
|
||||
---
|
||||
apiVersion: cilium.io/v2
|
||||
kind: CiliumNetworkPolicy
|
||||
metadata:
|
||||
name: vault-http-ingress
|
||||
namespace: vault
|
||||
spec:
|
||||
endpointSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: vault
|
||||
component: server
|
||||
ingress:
|
||||
- fromEndpoints:
|
||||
- matchLabels:
|
||||
"k8s:io.kubernetes.pod.namespace": external-secrets
|
||||
toPorts:
|
||||
- ports:
|
||||
- port: "8200"
|
||||
protocol: TCP
|
||||
- fromEndpoints:
|
||||
- matchLabels:
|
||||
"k8s:io.kubernetes.pod.namespace": traefik
|
||||
toPorts:
|
||||
- ports:
|
||||
- port: "8200"
|
||||
protocol: TCP
|
||||
- fromEndpoints:
|
||||
- matchLabels:
|
||||
"k8s:io.kubernetes.pod.namespace": vault
|
||||
toPorts:
|
||||
- ports:
|
||||
- port: "8200"
|
||||
protocol: TCP
|
||||
@@ -1,77 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# Configure Vault Kubernetes auth + KV v2 + policy/role for External Secrets Operator.
|
||||
# Requires: kubectl (cluster access), jq optional (openid issuer); Vault reachable via sts/vault.
|
||||
#
|
||||
# Usage (from repo root):
|
||||
# export KUBECONFIG=talos/kubeconfig # or your path
|
||||
# export VAULT_TOKEN='…' # root or admin token — never commit
|
||||
# ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
|
||||
#
|
||||
# Then: kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
|
||||
# Verify: kubectl describe clustersecretstore vault
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
: "${VAULT_TOKEN:?Set VAULT_TOKEN to your Vault root or admin token}"
|
||||
|
||||
ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
|
||||
REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
|
||||
CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
|
||||
|
||||
kubectl -n vault exec -i sts/vault -- env \
|
||||
VAULT_ADDR=http://127.0.0.1:8200 \
|
||||
VAULT_TOKEN="$VAULT_TOKEN" \
|
||||
sh -ec '
|
||||
set -e
|
||||
vault auth list >/tmp/vauth.txt
|
||||
grep -q "^kubernetes/" /tmp/vauth.txt || vault auth enable kubernetes
|
||||
'
|
||||
|
||||
kubectl -n vault exec -i sts/vault -- env \
|
||||
VAULT_ADDR=http://127.0.0.1:8200 \
|
||||
VAULT_TOKEN="$VAULT_TOKEN" \
|
||||
CA_B64="$CA_B64" \
|
||||
REVIEWER="$REVIEWER" \
|
||||
ISSUER="$ISSUER" \
|
||||
sh -ec '
|
||||
echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
|
||||
vault write auth/kubernetes/config \
|
||||
kubernetes_host="https://kubernetes.default.svc:443" \
|
||||
kubernetes_ca_cert=@/tmp/k8s-ca.crt \
|
||||
token_reviewer_jwt="$REVIEWER" \
|
||||
issuer="$ISSUER"
|
||||
'
|
||||
|
||||
kubectl -n vault exec -i sts/vault -- env \
|
||||
VAULT_ADDR=http://127.0.0.1:8200 \
|
||||
VAULT_TOKEN="$VAULT_TOKEN" \
|
||||
sh -ec '
|
||||
set -e
|
||||
vault secrets list >/tmp/vsec.txt
|
||||
grep -q "^secret/" /tmp/vsec.txt || vault secrets enable -path=secret kv-v2
|
||||
'
|
||||
|
||||
kubectl -n vault exec -i sts/vault -- env \
|
||||
VAULT_ADDR=http://127.0.0.1:8200 \
|
||||
VAULT_TOKEN="$VAULT_TOKEN" \
|
||||
sh -ec '
|
||||
vault policy write external-secrets - <<EOF
|
||||
path "secret/data/*" {
|
||||
capabilities = ["read", "list"]
|
||||
}
|
||||
path "secret/metadata/*" {
|
||||
capabilities = ["read", "list"]
|
||||
}
|
||||
EOF
|
||||
vault write auth/kubernetes/role/external-secrets \
|
||||
bound_service_account_names=external-secrets \
|
||||
bound_service_account_namespaces=external-secrets \
|
||||
policies=external-secrets \
|
||||
ttl=24h
|
||||
'
|
||||
|
||||
echo "Done. Issuer used: $ISSUER"
|
||||
echo ""
|
||||
echo "Next (each command on its own line — do not paste # comments after kubectl):"
|
||||
echo " kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
|
||||
echo " kubectl get clustersecretstore vault"
|
||||
@@ -1,5 +0,0 @@
|
||||
# HashiCorp Vault — apply before Helm.
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: vault
|
||||
@@ -1,63 +0,0 @@
|
||||
# Optional lab auto-unseal: applies after Vault is initialized and Secret `vault-unseal-key` exists.
|
||||
#
|
||||
# 1) vault operator init -key-shares=1 -key-threshold=1 (lab only — single key)
|
||||
# 2) kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
|
||||
# 3) kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
|
||||
#
|
||||
# OSS Vault has no Kubernetes/KMS seal; this CronJob runs vault operator unseal when the server is sealed.
|
||||
# Protect the Secret with RBAC; prefer cloud KMS auto-unseal for real environments.
|
||||
---
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: vault-auto-unseal
|
||||
namespace: vault
|
||||
spec:
|
||||
concurrencyPolicy: Forbid
|
||||
successfulJobsHistoryLimit: 1
|
||||
failedJobsHistoryLimit: 3
|
||||
schedule: "*/1 * * * *"
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
restartPolicy: OnFailure
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 100
|
||||
runAsGroup: 1000
|
||||
seccompProfile:
|
||||
type: RuntimeDefault
|
||||
containers:
|
||||
- name: unseal
|
||||
image: hashicorp/vault:1.21.2
|
||||
imagePullPolicy: IfNotPresent
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
env:
|
||||
- name: VAULT_ADDR
|
||||
value: http://vault.vault.svc:8200
|
||||
command:
|
||||
- /bin/sh
|
||||
- -ec
|
||||
- |
|
||||
test -f /secrets/key || exit 0
|
||||
status="$(vault status -format=json 2>/dev/null || true)"
|
||||
echo "$status" | grep -q '"initialized":true' || exit 0
|
||||
echo "$status" | grep -q '"sealed":false' && exit 0
|
||||
vault operator unseal "$(cat /secrets/key)"
|
||||
volumeMounts:
|
||||
- name: unseal
|
||||
mountPath: /secrets
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: unseal
|
||||
secret:
|
||||
secretName: vault-unseal-key
|
||||
optional: true
|
||||
items:
|
||||
- key: key
|
||||
path: key
|
||||
@@ -1,62 +0,0 @@
|
||||
# HashiCorp Vault — noble (standalone, file storage on Longhorn; TLS disabled on listener for in-cluster HTTP).
|
||||
#
|
||||
# helm repo add hashicorp https://helm.releases.hashicorp.com
|
||||
# helm repo update
|
||||
# kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
|
||||
# helm upgrade --install vault hashicorp/vault -n vault \
|
||||
# --version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
|
||||
#
|
||||
# Post-install: initialize, store unseal key in Secret, apply optional unseal CronJob — see README.md
|
||||
#
|
||||
global:
|
||||
tlsDisable: true
|
||||
|
||||
injector:
|
||||
enabled: true
|
||||
|
||||
server:
|
||||
enabled: true
|
||||
dataStorage:
|
||||
enabled: true
|
||||
size: 10Gi
|
||||
storageClass: longhorn
|
||||
accessMode: ReadWriteOnce
|
||||
ha:
|
||||
enabled: false
|
||||
standalone:
|
||||
enabled: true
|
||||
config: |
|
||||
ui = true
|
||||
|
||||
listener "tcp" {
|
||||
tls_disable = 1
|
||||
address = "[::]:8200"
|
||||
cluster_address = "[::]:8201"
|
||||
}
|
||||
|
||||
storage "file" {
|
||||
path = "/vault/data"
|
||||
}
|
||||
|
||||
# Allow pod Ready before init/unseal so Helm --wait succeeds (see Vault /v1/sys/health docs).
|
||||
readinessProbe:
|
||||
enabled: true
|
||||
path: "/v1/sys/health?uninitcode=204&sealedcode=204&standbyok=true"
|
||||
port: 8200
|
||||
|
||||
# LAN: TLS terminates at Traefik + cert-manager; listener stays HTTP (global.tlsDisable).
|
||||
ingress:
|
||||
enabled: true
|
||||
ingressClassName: traefik
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
hosts:
|
||||
- host: vault.apps.noble.lab.pcenicni.dev
|
||||
paths: []
|
||||
tls:
|
||||
- secretName: vault-apps-noble-tls
|
||||
hosts:
|
||||
- vault.apps.noble.lab.pcenicni.dev
|
||||
|
||||
ui:
|
||||
enabled: true
|
||||
118
clusters/noble/bootstrap/velero/README.md
Normal file
118
clusters/noble/bootstrap/velero/README.md
Normal file
@@ -0,0 +1,118 @@
|
||||
# Velero (cluster backups)
|
||||
|
||||
Ansible-managed core stack — **not** reconciled by Argo CD (`clusters/noble/apps` is optional GitOps only).
|
||||
|
||||
## What you get
|
||||
|
||||
- **No web UI** — Velero is operated with the **`velero`** CLI and **`kubectl`** (Backup, Schedule, Restore CRDs). Metrics are exposed for Prometheus; there is no first-party dashboard in this chart.
|
||||
- **vmware-tanzu/velero** Helm chart (**12.0.0** → Velero **1.18.0**) in namespace **`velero`**
|
||||
- **AWS plugin** init container for **S3-compatible** object storage (`velero/velero-plugin-for-aws:v1.14.0`)
|
||||
- **CSI snapshots** via Velero’s built-in CSI support (`EnableCSI`) and **VolumeSnapshotLocation** `velero.io/csi` (no separate CSI plugin image for Velero ≥ 1.14)
|
||||
- **Prometheus** scraping: **ServiceMonitor** labeled for **kube-prometheus** (`release: kube-prometheus`)
|
||||
- **Schedule** **`velero-daily-noble`**: cron **`0 3 * * *`** (daily at 03:00 in the Velero pod’s timezone, usually **UTC**), **720h** TTL per backup (~30 days). Edit **`values.yaml`** `schedules` to change time or retention.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Volume Snapshot APIs** installed cluster-wide — **`clusters/noble/bootstrap/csi-snapshot-controller/`** (Ansible **`noble_csi_snapshot_controller`**, after **Cilium**). Without **`snapshot.storage.k8s.io`** CRDs and **`kube-system/snapshot-controller`**, Velero logs errors like `no matches for kind "VolumeSnapshot"`.
|
||||
2. **Longhorn** (or another CSI driver) with a **VolumeSnapshotClass** for that driver.
|
||||
3. For **Longhorn**, this repo applies **`velero/longhorn-volumesnapshotclass.yaml`** (`VolumeSnapshotClass` **`longhorn-velero`**, driver **`driver.longhorn.io`**, Velero label). It is included in **`clusters/noble/bootstrap/kustomization.yaml`** (same apply as other bootstrap YAML). For non-Longhorn drivers, add a class with **`velero.io/csi-volumesnapshot-class: "true"`** (see [Velero CSI](https://velero.io/docs/main/csi/)).
|
||||
|
||||
4. **S3-compatible** endpoint (MinIO, VersityGW, AWS, etc.) and a **bucket**.
|
||||
|
||||
## Credentials Secret
|
||||
|
||||
Velero expects **`velero/velero-cloud-credentials`**, key **`cloud`**, in **INI** form for the AWS plugin:
|
||||
|
||||
```ini
|
||||
[default]
|
||||
aws_access_key_id=<key>
|
||||
aws_secret_access_key=<secret>
|
||||
```
|
||||
|
||||
Create manually:
|
||||
|
||||
```bash
|
||||
kubectl -n velero create secret generic velero-cloud-credentials \
|
||||
--from-literal=cloud="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' "$KEY" "$SECRET")"
|
||||
```
|
||||
|
||||
Or let **Ansible** create it from **`.env`** (`NOBLE_VELERO_AWS_ACCESS_KEY_ID`, `NOBLE_VELERO_AWS_SECRET_ACCESS_KEY`) or from extra-vars **`noble_velero_aws_access_key_id`** / **`noble_velero_aws_secret_access_key`**.
|
||||
|
||||
## Apply (Ansible)
|
||||
|
||||
1. Copy **`.env.sample`** → **`.env`** at the **repository root** and set at least:
|
||||
- **`NOBLE_VELERO_S3_BUCKET`** — object bucket name
|
||||
- **`NOBLE_VELERO_S3_URL`** — S3 API base URL (e.g. `https://minio.lan:9000` or your VersityGW/MinIO endpoint)
|
||||
- **`NOBLE_VELERO_AWS_ACCESS_KEY_ID`** / **`NOBLE_VELERO_AWS_SECRET_ACCESS_KEY`** — credentials the AWS plugin uses (S3-compatible access key style)
|
||||
|
||||
2. Enable the role: set **`noble_velero_install: true`** in **`ansible/group_vars/all.yml`**, **or** pass **`-e noble_velero_install=true`** on the command line.
|
||||
|
||||
3. Run from **`ansible/`** (adjust **`KUBECONFIG`** to your cluster admin kubeconfig):
|
||||
|
||||
```bash
|
||||
cd ansible
|
||||
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
|
||||
|
||||
# Velero only (after helm repos; skips other roles unless their tags match — use full playbook if unsure)
|
||||
ansible-playbook playbooks/noble.yml --tags repos,velero -e noble_velero_install=true
|
||||
```
|
||||
|
||||
If **`NOBLE_VELERO_S3_BUCKET`** / **`NOBLE_VELERO_S3_URL`** are not in **`.env`**, pass them explicitly:
|
||||
|
||||
```bash
|
||||
ansible-playbook playbooks/noble.yml --tags repos,velero -e noble_velero_install=true \
|
||||
-e noble_velero_s3_bucket=my-bucket \
|
||||
-e noble_velero_s3_url=https://s3.example.com:9000
|
||||
```
|
||||
|
||||
Full platform run (includes Velero when **`noble_velero_install`** is true in **`group_vars`**):
|
||||
|
||||
```bash
|
||||
ansible-playbook playbooks/noble.yml
|
||||
```
|
||||
|
||||
## Install (Ansible) — details
|
||||
|
||||
1. Set **`noble_velero_install: true`** in **`ansible/group_vars/all.yml`** (or pass **`-e noble_velero_install=true`**).
|
||||
2. Set **`noble_velero_s3_bucket`** and **`noble_velero_s3_url`** via **`.env`** (**`NOBLE_VELERO_S3_*`**) or **`group_vars`** or **`-e`**. Extra-vars override **`.env`**. Optional: **`noble_velero_s3_region`**, **`noble_velero_s3_prefix`**, **`noble_velero_s3_force_path_style`** (defaults match `values.yaml`).
|
||||
3. Run **`ansible/playbooks/noble.yml`** (Velero runs after **`noble_platform`**).
|
||||
|
||||
Example without **`.env`** (all on the CLI):
|
||||
|
||||
```bash
|
||||
cd ansible
|
||||
ansible-playbook playbooks/noble.yml --tags velero \
|
||||
-e noble_velero_install=true \
|
||||
-e noble_velero_s3_bucket=noble-velero \
|
||||
-e noble_velero_s3_url=https://minio.lan:9000 \
|
||||
-e noble_velero_aws_access_key_id="$KEY" \
|
||||
-e noble_velero_aws_secret_access_key="$SECRET"
|
||||
```
|
||||
|
||||
The **`clusters/noble/bootstrap/kustomization.yaml`** applies **`velero/namespace.yaml`** with the rest of the bootstrap namespaces (so **`velero`** exists before Helm).
|
||||
|
||||
## Install (Helm only)
|
||||
|
||||
From repo root:
|
||||
|
||||
```bash
|
||||
kubectl apply -f clusters/noble/bootstrap/velero/namespace.yaml
|
||||
# Create velero-cloud-credentials (see above), then:
|
||||
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts && helm repo update
|
||||
helm upgrade --install velero vmware-tanzu/velero -n velero --version 12.0.0 \
|
||||
-f clusters/noble/bootstrap/velero/values.yaml \
|
||||
--set-string configuration.backupStorageLocation[0].bucket=YOUR_BUCKET \
|
||||
--set-string configuration.backupStorageLocation[0].config.s3Url=https://YOUR-S3-ENDPOINT \
|
||||
--wait
|
||||
```
|
||||
|
||||
Edit **`values.yaml`** defaults (bucket placeholder, `s3Url`) or override with **`--set-string`** as above.
|
||||
|
||||
## Quick checks
|
||||
|
||||
```bash
|
||||
kubectl -n velero get pods,backupstoragelocation,volumesnapshotlocation
|
||||
velero backup create test --wait
|
||||
```
|
||||
|
||||
(`velero` CLI: install from [Velero releases](https://github.com/vmware-tanzu/velero/releases).)
|
||||
@@ -0,0 +1,11 @@
|
||||
# Default Longhorn VolumeSnapshotClass for Velero CSI — one class per driver may carry
|
||||
# **velero.io/csi-volumesnapshot-class: "true"** (see velero/README.md).
|
||||
# Apply after **Longhorn** CSI is running (`driver.longhorn.io`).
|
||||
apiVersion: snapshot.storage.k8s.io/v1
|
||||
kind: VolumeSnapshotClass
|
||||
metadata:
|
||||
name: longhorn-velero
|
||||
labels:
|
||||
velero.io/csi-volumesnapshot-class: "true"
|
||||
driver: driver.longhorn.io
|
||||
deletionPolicy: Delete
|
||||
5
clusters/noble/bootstrap/velero/namespace.yaml
Normal file
5
clusters/noble/bootstrap/velero/namespace.yaml
Normal file
@@ -0,0 +1,5 @@
|
||||
# Velero — apply before Helm (Ansible **noble_velero**).
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: velero
|
||||
65
clusters/noble/bootstrap/velero/values.yaml
Normal file
65
clusters/noble/bootstrap/velero/values.yaml
Normal file
@@ -0,0 +1,65 @@
|
||||
# Velero Helm values — vmware-tanzu/velero chart (see CLUSTER-BUILD.md Phase F).
|
||||
# Install: **ansible/playbooks/noble.yml** role **noble_velero** (override S3 settings via **noble_velero_*** vars).
|
||||
# Requires Secret **velero/velero-cloud-credentials** key **cloud** (INI for AWS plugin — see README).
|
||||
#
|
||||
# Chart: vmware-tanzu/velero — pin version on install (e.g. 12.0.0 / Velero 1.18.0).
|
||||
# helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts && helm repo update
|
||||
# kubectl apply -f clusters/noble/bootstrap/velero/namespace.yaml
|
||||
# helm upgrade --install velero vmware-tanzu/velero -n velero --version 12.0.0 -f clusters/noble/bootstrap/velero/values.yaml
|
||||
|
||||
initContainers:
|
||||
- name: velero-plugin-for-aws
|
||||
image: velero/velero-plugin-for-aws:v1.14.0
|
||||
imagePullPolicy: IfNotPresent
|
||||
volumeMounts:
|
||||
- mountPath: /target
|
||||
name: plugins
|
||||
|
||||
configuration:
|
||||
features: EnableCSI
|
||||
defaultBackupStorageLocation: default
|
||||
defaultVolumeSnapshotLocations: velero.io/csi:default
|
||||
|
||||
backupStorageLocation:
|
||||
- name: default
|
||||
provider: aws
|
||||
bucket: noble-velero
|
||||
default: true
|
||||
accessMode: ReadWrite
|
||||
credential:
|
||||
name: velero-cloud-credentials
|
||||
key: cloud
|
||||
config:
|
||||
region: us-east-1
|
||||
s3ForcePathStyle: "true"
|
||||
s3Url: https://s3.CHANGE-ME.invalid
|
||||
|
||||
volumeSnapshotLocation:
|
||||
- name: default
|
||||
provider: velero.io/csi
|
||||
config: {}
|
||||
|
||||
credentials:
|
||||
useSecret: true
|
||||
existingSecret: velero-cloud-credentials
|
||||
|
||||
snapshotsEnabled: true
|
||||
deployNodeAgent: false
|
||||
|
||||
metrics:
|
||||
enabled: true
|
||||
serviceMonitor:
|
||||
enabled: true
|
||||
autodetect: true
|
||||
additionalLabels:
|
||||
release: kube-prometheus
|
||||
|
||||
# Daily full-cluster backup at 03:00 — cron is evaluated in the Velero pod (typically **UTC**; set TZ on the
|
||||
# Deployment if you need local wall clock). See `helm upgrade --install` to apply.
|
||||
schedules:
|
||||
daily-noble:
|
||||
disabled: false
|
||||
schedule: "0 3 * * *"
|
||||
template:
|
||||
ttl: 720h
|
||||
storageLocation: default
|
||||
38
clusters/noble/secrets/README.md
Normal file
38
clusters/noble/secrets/README.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# SOPS-encrypted cluster secrets (noble)
|
||||
|
||||
Secrets that belong in git are stored here as **Mozilla SOPS** files encrypted with [age](https://github.com/FiloSottile/age). The matching **private** key lives in **`age-key.txt`** at the repository root (gitignored — create with `age-keygen -o age-key.txt` and add the public key to **`.sops.yaml`** if you rotate keys).
|
||||
|
||||
**Migrating from an older cluster** that ran **Vault**, **Sealed Secrets**, or **External Secrets Operator:** uninstall those Helm releases (`helm uninstall vault -n vault`, etc.), delete their namespaces if empty, and export any secrets you still need into plain **`Secret`** YAML here, then encrypt with **`sops`** before committing.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- [sops](https://github.com/getsops/sops) and **age** on the machine that encrypts or applies secrets.
|
||||
|
||||
## Edit or create a Secret
|
||||
|
||||
```bash
|
||||
export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
|
||||
|
||||
# Create a new file from a template, then encrypt:
|
||||
sops clusters/noble/secrets/example.secret.yaml
|
||||
|
||||
# Or edit an existing encrypted file (opens decrypted in $EDITOR):
|
||||
sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
|
||||
```
|
||||
|
||||
## Apply to the cluster
|
||||
|
||||
```bash
|
||||
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
|
||||
export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
|
||||
|
||||
sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
|
||||
```
|
||||
|
||||
**Ansible** (`noble.yml`) runs the same decrypt-and-apply step for every `*.yaml` in this directory when **`age-key.txt`** exists and **`noble_apply_sops_secrets`** is true (see `ansible/group_vars/all.yml`).
|
||||
|
||||
## Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `newt-pangolin-auth.secret.yaml` | Pangolin tunnel credentials for [Newt](../../bootstrap/newt/README.md) (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`). Replace placeholders and re-encrypt before relying on them. |
|
||||
30
clusters/noble/secrets/newt-pangolin-auth.secret.yaml
Normal file
30
clusters/noble/secrets/newt-pangolin-auth.secret.yaml
Normal file
@@ -0,0 +1,30 @@
|
||||
apiVersion: ENC[AES256_GCM,data:FaA=,iv:EsqIdZmNS4hfzwCZ0gL7Q5Czaz8Bii3jWFu60lKmgVo=,tag:tfr4yUuTiH4s+ufYW/dpCA==,type:str]
|
||||
kind: ENC[AES256_GCM,data:ozpTcG9F,iv:Q1EZ896Plhyz2qM4JJRnBf940kbVLSwyIIPUcDGBZFA=,tag:1bWEgI+I4Ni5J70MlohYdA==,type:str]
|
||||
metadata:
|
||||
name: ENC[AES256_GCM,data:moXbGuT6ZOGhgVUBNcpHeLZQ,iv:1WDtxT/Et/6lxx1Mj93CQME8o0lhzxnBMkdSqP/n3R0=,tag:v+iqfE8tzCx8ZOMUW7OyEA==,type:str]
|
||||
namespace: ENC[AES256_GCM,data:33/AMg==,iv:M0GvB/70nHh4MVR1saZy1pGY8IFFzkzGdJl4szHJbCI=,tag:0+1LX/EnkAP0FZ6ARKZNAA==,type:str]
|
||||
type: ENC[AES256_GCM,data:3io5utU1,iv:QqMDNL/R8SR7TC9mwDdDd3V6VOo+csgeiZCr2AdOZjw=,tag:/KSMy+vNz7Qj/I463eG0LQ==,type:str]
|
||||
stringData:
|
||||
PANGOLIN_ENDPOINT: ENC[AES256_GCM,data:a/2QTnGYnNXGxNm8QSVTKC6I+r88J1m1CdMmTA==,iv:L2LvLD7IRX8wdAzALAWQ2ojB2OjWDIcVKrdi/lSvZFY=,tag:ALffRF9bncxA8CExSaRmHA==,type:str]
|
||||
NEWT_ID: ENC[AES256_GCM,data:Xfe8QvBdX62CciYXYwMfJAzIE/0=,iv:tA+FJ93tsjJ29L3bSxNAEooiKPMc+5pa00EpQ2cJkho=,tag:auiR/zQjnsmyllXbSJf3KA==,type:str]
|
||||
NEWT_SECRET: ENC[AES256_GCM,data:XY8XZOkZ+GpnjljbvtaH2oGJpDoZ47fN,iv:+J5sb7saqbVwHEyemx3CUSsdKArubRdPCLGbT09sFLM=,tag:zUowv8I1CaWZH+KLYOwKYw==,type:str]
|
||||
sops:
|
||||
kms: []
|
||||
gcp_kms: []
|
||||
azure_kv: []
|
||||
hc_vault: []
|
||||
age:
|
||||
- recipient: age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn
|
||||
enc: |
|
||||
-----BEGIN AGE ENCRYPTED FILE-----
|
||||
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB0RWppdWxZUEYzc2I2TURi
|
||||
dm1pUzVaNDA4YldsWkFJODl1MWZ6MXFxWnhjCnVtU1VEQnJqbTI5M0hWM2FCaVlS
|
||||
aXprTm42bTlldUVHMmxpUUJiWEVhcXcKLS0tIGNLVnNtNDdMQ0VVeDV1N29nOW9F
|
||||
clhLa2tPdWtRMWYzc2YrR0hSQXczTlUK6hYj4HxQvu6Kqn/Ki+cYv9x5nvolyGqQ
|
||||
N4g9z+t6orT6MYseWPf0uyovC/5iOOC6z/2exVe7/0rYo7ZOFm6dYQ==
|
||||
-----END AGE ENCRYPTED FILE-----
|
||||
lastmodified: "2026-03-29T23:37:33Z"
|
||||
mac: ENC[AES256_GCM,data:uKtdqJhwE4HLCenHH+RG8O2yfVIcGbiXznL9ouAXhDLnQh/ksgeczr2fyyn9hs/JhCozAqRrF8vnYZsIdfG1DQfHjXn6Ro6gzYC0YR+gvFU8Mz9uPdVX3HYjUrzKJ5GhhBami0USZtLdGKOGgFDYmFoDsD/PmMXLUol8qJdW8Uk=,iv:rIfQI17+3vNBB1n//D7Wnl/SLWFjV0pgZDteumlS2f8=,tag:xibCfJdZQS+aB75drmY1VA==,type:str]
|
||||
pgp: []
|
||||
unencrypted_suffix: _unencrypted
|
||||
version: 3.9.3
|
||||
29
clusters/noble/wip/eclipse-che/README.md
Normal file
29
clusters/noble/wip/eclipse-che/README.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Eclipse Che (optional — Argo CD)
|
||||
|
||||
Three **Application** resources (sync waves **0 → 1 → 2**):
|
||||
|
||||
| Wave | Application | Purpose |
|
||||
|------|-------------|---------|
|
||||
| 0 | `eclipse-che-devworkspace` | [DevWorkspace operator](https://github.com/devfile/devworkspace-operator) **v0.33.0** (`devworkspace/kustomization.yaml` → remote `combined.yaml`) |
|
||||
| 1 | `eclipse-che-operator` | [Eclipse Che Helm chart](https://artifacthub.io/packages/helm/eclipse-che/eclipse-che) **7.116.0** (operator in **`eclipse-che`**) |
|
||||
| 2 | `eclipse-che-cluster` | **`CheCluster`** (`checluster.yaml`) — Traefik + **cert-manager** TLS |
|
||||
|
||||
**Prerequisites (cluster):** **cert-manager** + **Traefik** (noble bootstrap). **DNS:** `che.apps.noble.lab.pcenicni.dev` → Traefik LB (edit **`checluster.yaml`** if your domain differs).
|
||||
|
||||
**First sync:** Wave ordering applies to **Application** CRs under **noble-root**; if the operator starts before DevWorkspace is ready, **Refresh**/**Sync** the child apps once. See [Eclipse Che on Kubernetes](https://eclipse.dev/che/docs/stable/administration-guide/installing-che-on-kubernetes/).
|
||||
|
||||
**URL:** `kubectl get checluster eclipse-che -n eclipse-che -o jsonpath='{.status.cheURL}{"\n"}'` after **Phase** is **Active`.
|
||||
|
||||
## Troubleshooting — “no available server” (or similar)
|
||||
|
||||
**1. Eclipse Che / dashboard**
|
||||
|
||||
- **DevWorkspace routing:** On Kubernetes you **must** set **`routing.clusterHostSuffix`** in **`DevWorkspaceOperatorConfig`** `devworkspace-operator-config` (`devworkspace/dwoc.yaml`). If it was missing, sync **`eclipse-che-devworkspace`** again, then **`eclipse-che-operator`** / **`eclipse-che-cluster`**.
|
||||
- **Status:** `kubectl get checluster eclipse-che -n eclipse-che -o jsonpath='{.status.chePhase}{"\n"}'` → expect **`Active`**.
|
||||
- **Pods:** `kubectl get pods -n eclipse-che` — wait for **Running** (Keycloak / gateway / server can take many minutes).
|
||||
- **Ingress + DNS:** `kubectl get ingress -n eclipse-che` — host **`che.apps.noble.lab.pcenicni.dev`** must resolve to your Traefik LB (same as Grafana/Homepage).
|
||||
- **TLS:** `kubectl describe certificate -n eclipse-che` (if present) — Let’s Encrypt must succeed before the browser trusts the URL.
|
||||
|
||||
**2. Argo CD UI / repo**
|
||||
|
||||
If the message appears in **Argo CD** (not Che), check in-cluster components: `kubectl get pods -n argocd`, `kubectl logs -n argocd deploy/argocd-repo-server --tail=80`, and that **Applications** use `destination.server: https://kubernetes.default.svc` (in-cluster), not a missing external API.
|
||||
27
clusters/noble/wip/eclipse-che/application-checluster.yaml
Normal file
27
clusters/noble/wip/eclipse-che/application-checluster.yaml
Normal file
@@ -0,0 +1,27 @@
|
||||
# CheCluster CR — sync wave 2 (operator must be Ready to reconcile).
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: eclipse-che-cluster
|
||||
namespace: argocd
|
||||
annotations:
|
||||
argocd.argoproj.io/sync-wave: "2"
|
||||
finalizers:
|
||||
- resources-finalizer.argocd.argoproj.io/background
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
|
||||
targetRevision: HEAD
|
||||
path: clusters/noble/apps/eclipse-che
|
||||
directory:
|
||||
include: checluster.yaml
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: eclipse-che
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- ServerSideApply=true
|
||||
26
clusters/noble/wip/eclipse-che/application-devworkspace.yaml
Normal file
26
clusters/noble/wip/eclipse-che/application-devworkspace.yaml
Normal file
@@ -0,0 +1,26 @@
|
||||
# DevWorkspace operator — must sync before Eclipse Che Helm (sync wave 0).
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: eclipse-che-devworkspace
|
||||
namespace: argocd
|
||||
annotations:
|
||||
argocd.argoproj.io/sync-wave: "0"
|
||||
finalizers:
|
||||
- resources-finalizer.argocd.argoproj.io/background
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
|
||||
targetRevision: HEAD
|
||||
path: clusters/noble/apps/eclipse-che/devworkspace
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: devworkspace-controller
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
- ServerSideApply=true
|
||||
28
clusters/noble/wip/eclipse-che/application-operator.yaml
Normal file
28
clusters/noble/wip/eclipse-che/application-operator.yaml
Normal file
@@ -0,0 +1,28 @@
|
||||
# Eclipse Che operator (Helm) — sync wave 1 (after DevWorkspace CRDs/controller).
|
||||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: eclipse-che-operator
|
||||
namespace: argocd
|
||||
annotations:
|
||||
argocd.argoproj.io/sync-wave: "1"
|
||||
finalizers:
|
||||
- resources-finalizer.argocd.argoproj.io/background
|
||||
spec:
|
||||
project: default
|
||||
source:
|
||||
repoURL: https://eclipse-che.github.io/che-operator/charts
|
||||
chart: eclipse-che
|
||||
targetRevision: 7.116.0
|
||||
helm:
|
||||
releaseName: eclipse-che
|
||||
destination:
|
||||
server: https://kubernetes.default.svc
|
||||
namespace: eclipse-che
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
- ServerSideApply=true
|
||||
24
clusters/noble/wip/eclipse-che/checluster.yaml
Normal file
24
clusters/noble/wip/eclipse-che/checluster.yaml
Normal file
@@ -0,0 +1,24 @@
|
||||
# Eclipse Che instance — applied after **che-operator** is running (sync wave 2).
|
||||
# Edit **hostname** / **domain** if your ingress DNS differs from the noble lab pattern.
|
||||
#
|
||||
# **devEnvironments.networking.externalTLSConfig** — required with cert-manager for **workspace** subdomains.
|
||||
# Without it, Che creates secure workspace Ingresses with TLS hosts but **no secretName**, so cert-manager
|
||||
# never issues certs and the dashboard often shows **no available server** when opening a workspace.
|
||||
apiVersion: org.eclipse.che/v2
|
||||
kind: CheCluster
|
||||
metadata:
|
||||
name: eclipse-che
|
||||
namespace: eclipse-che
|
||||
spec:
|
||||
devEnvironments:
|
||||
networking:
|
||||
externalTLSConfig:
|
||||
enabled: true
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
networking:
|
||||
domain: apps.noble.lab.pcenicni.dev
|
||||
hostname: che.apps.noble.lab.pcenicni.dev
|
||||
ingressClassName: traefik
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
14
clusters/noble/wip/eclipse-che/devworkspace/dwoc.yaml
Normal file
14
clusters/noble/wip/eclipse-che/devworkspace/dwoc.yaml
Normal file
@@ -0,0 +1,14 @@
|
||||
# Required on **Kubernetes** (OpenShift discovers this automatically). See DevWorkspaceOperatorConfig CRD:
|
||||
# **routing.clusterHostSuffix** — hostname suffix for DevWorkspace routes. Without this, Che / workspaces
|
||||
# often fail with errors like **no available server** or broken routing.
|
||||
# Must be named **devworkspace-operator-config** in **devworkspace-controller**.
|
||||
# v1alpha1 uses a root-level **config** key (not spec.config); see combined.yaml CRD for devworkspaceoperatorconfigs.
|
||||
# Edit if your ingress base domain differs from the noble lab pattern.
|
||||
apiVersion: controller.devfile.io/v1alpha1
|
||||
kind: DevWorkspaceOperatorConfig
|
||||
metadata:
|
||||
name: devworkspace-operator-config
|
||||
namespace: devworkspace-controller
|
||||
config:
|
||||
routing:
|
||||
clusterHostSuffix: apps.noble.lab.pcenicni.dev
|
||||
@@ -0,0 +1,7 @@
|
||||
# DevWorkspace operator — prerequisite for Eclipse Che (pinned tag).
|
||||
# https://github.com/devfile/devworkspace-operator
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources:
|
||||
- https://raw.githubusercontent.com/devfile/devworkspace-operator/v0.33.0/deploy/deployment/kubernetes/combined.yaml
|
||||
- dwoc.yaml
|
||||
169
docs/Racks.md
Normal file
169
docs/Racks.md
Normal file
@@ -0,0 +1,169 @@
|
||||
# Physical racks — Noble lab (10")
|
||||
|
||||
This page is a **logical rack layout** for the **noble** Talos lab: **three 10" (half-width) racks**, how **rack units (U)** are used, and **Ethernet** paths on **`192.168.50.0/24`**. Node names and IPs match [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) and [`docs/architecture.md`](architecture.md).
|
||||
|
||||
## Legend
|
||||
|
||||
| Symbol | Meaning |
|
||||
|--------|---------|
|
||||
| `█` / filled cell | Equipment occupying that **1U** |
|
||||
| `░` | Reserved / future use |
|
||||
| `·` | Empty |
|
||||
| `━━` | Copper to LAN switch |
|
||||
|
||||
**Rack unit numbering:** **U increases upward** (U1 = bottom of rack, like ANSI/EIA). **Slot** in the diagrams is **top → bottom** reading order for a quick visual scan.
|
||||
|
||||
### Three racks at a glance
|
||||
|
||||
Read **top → bottom** (first row = top of rack).
|
||||
|
||||
| Primary (10") | Storage B (10") | Rack C (10") |
|
||||
|-----------------|-----------------|--------------|
|
||||
| Fiber ONT | Mac Mini | *empty* |
|
||||
| UniFi Fiber Gateway | NAS | *empty* |
|
||||
| Patch panel | JBOD | *empty* |
|
||||
| 2.5 GbE ×8 PoE switch | *empty* | *empty* |
|
||||
| Raspberry Pi cluster | *empty* | *empty* |
|
||||
| **helium** (Talos) | *empty* | *empty* |
|
||||
| **neon** (Talos) | *empty* | *empty* |
|
||||
| **argon** (Talos) | *empty* | *empty* |
|
||||
| **krypton** (Talos) | *empty* | *empty* |
|
||||
|
||||
**Connectivity:** Primary rack gear shares **one L2** (`192.168.50.0/24`). Storage B and Rack C link the same way when cabled (e.g. **Ethernet** to the PoE switch, **VPN** or flat LAN per your design).
|
||||
|
||||
---
|
||||
|
||||
## Rack A — LAN aggregation (10" × 12U)
|
||||
|
||||
Dedicated to **Layer-2 access** and cable home runs. All cluster nodes plug into this switch (or into a downstream switch that uplinks here).
|
||||
|
||||
```
|
||||
TOP OF RACK
|
||||
┌────────────────────────────────────────┐
|
||||
│ Slot 1 ········· empty ·············· │ 12U
|
||||
│ Slot 2 ········· empty ·············· │ 11U
|
||||
│ Slot 3 ········· empty ·············· │ 10U
|
||||
│ Slot 4 ········· empty ·············· │ 9U
|
||||
│ Slot 5 ········· empty ·············· │ 8U
|
||||
│ Slot 6 ········· empty ·············· │ 7U
|
||||
│ Slot 7 ░░░░░░░ optional PDU ░░░░░░░░ │ 6U
|
||||
│ Slot 8 █████ 1U cable manager ██████ │ 5U
|
||||
│ Slot 9 █████ 1U patch panel █████████ │ 4U
|
||||
│ Slot10 ███ 8-port managed switch ████ │ 3U ← LAN L2 spine
|
||||
│ Slot11 ········· empty ·············· │ 2U
|
||||
│ Slot12 ········· empty ·············· │ 1U
|
||||
└────────────────────────────────────────┘
|
||||
BOTTOM
|
||||
```
|
||||
|
||||
**Network role:** Every node NIC → **switch access port** → same **VLAN / flat LAN** as documented; **kube-vip** VIP **`192.168.50.230`**, **MetalLB** **`192.168.50.210`–`229`**, **Traefik** **`192.168.50.211`** are **logical** on node IPs (no extra hardware).
|
||||
|
||||
---
|
||||
|
||||
## Rack B — Control planes (10" × 12U)
|
||||
|
||||
Three **Talos control-plane** nodes (**scheduling allowed** on CPs per `talconfig.yaml`).
|
||||
|
||||
```
|
||||
TOP OF RACK
|
||||
┌────────────────────────────────────────┐
|
||||
│ Slot 1 ········· empty ·············· │ 12U
|
||||
│ Slot 2 ········· empty ·············· │ 11U
|
||||
│ Slot 3 ········· empty ·············· │ 10U
|
||||
│ Slot 4 ········· empty ·············· │ 9U
|
||||
│ Slot 5 ········· empty ·············· │ 8U
|
||||
│ Slot 6 ········· empty ·············· │ 7U
|
||||
│ Slot 7 ········· empty ·············· │ 6U
|
||||
│ Slot 8 █ neon control-plane .20 ████ │ 5U
|
||||
│ Slot 9 █ argon control-plane .30 ███ │ 4U
|
||||
│ Slot10 █ krypton control-plane .40 ██ │ 3U (kube-vip VIP .230)
|
||||
│ Slot11 ········· empty ·············· │ 2U
|
||||
│ Slot12 ········· empty ·············· │ 1U
|
||||
└────────────────────────────────────────┘
|
||||
BOTTOM
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rack C — Worker (10" × 12U)
|
||||
|
||||
Single **worker** node; **Longhorn** data disk is **local** to each node (see `talconfig.yaml`); no separate NAS in this diagram.
|
||||
|
||||
```
|
||||
TOP OF RACK
|
||||
┌────────────────────────────────────────┐
|
||||
│ Slot 1 ········· empty ·············· │ 12U
|
||||
│ Slot 2 ········· empty ·············· │ 11U
|
||||
│ Slot 3 ········· empty ·············· │ 10U
|
||||
│ Slot 4 ········· empty ·············· │ 9U
|
||||
│ Slot 5 ········· empty ·············· │ 8U
|
||||
│ Slot 6 ········· empty ·············· │ 7U
|
||||
│ Slot 7 ░░░░░░░ spare / future ░░░░░░░░ │ 6U
|
||||
│ Slot 8 ········· empty ·············· │ 5U
|
||||
│ Slot 9 ········· empty ·············· │ 4U
|
||||
│ Slot10 ███ helium worker .10 █████ │ 3U
|
||||
│ Slot11 ········· empty ·············· │ 2U
|
||||
│ Slot12 ········· empty ·············· │ 1U
|
||||
└────────────────────────────────────────┘
|
||||
BOTTOM
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Space summary
|
||||
|
||||
| System | Rack | Approx. U | IP | Role |
|
||||
|--------|------|-----------|-----|------|
|
||||
| LAN switch | A | 1U | — | All nodes on `192.168.50.0/24` |
|
||||
| Patch / cable mgmt | A | 2× 1U | — | Physical plant |
|
||||
| **neon** | B | 1U | `192.168.50.20` | control-plane + schedulable |
|
||||
| **argon** | B | 1U | `192.168.50.30` | control-plane + schedulable |
|
||||
| **krypton** | B | 1U | `192.168.50.40` | control-plane + schedulable |
|
||||
| **helium** | C | 1U | `192.168.50.10` | worker |
|
||||
|
||||
Adjust **empty vs. future** rows if your chassis are **2U** or on **shelves** — scale the `█` blocks accordingly.
|
||||
|
||||
---
|
||||
|
||||
## Network connections
|
||||
|
||||
All cluster nodes are on **one flat LAN**. **kube-vip** floats **`192.168.50.230:6443`** across the three control-plane hosts on **`ens18`** (see cluster bootstrap docs).
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph RACK_A["Rack A — 10\""]
|
||||
SW["Managed switch<br/>192.168.50.0/24 L2"]
|
||||
PP["Patch / cable mgmt"]
|
||||
SW --- PP
|
||||
end
|
||||
subgraph RACK_B["Rack B — 10\""]
|
||||
N["neon :20"]
|
||||
A["argon :30"]
|
||||
K["krypton :40"]
|
||||
end
|
||||
subgraph RACK_C["Rack C — 10\""]
|
||||
H["helium :10"]
|
||||
end
|
||||
subgraph LOGICAL["Logical (any node holding VIP)"]
|
||||
VIP["API VIP 192.168.50.230<br/>kube-vip → apiserver :6443"]
|
||||
end
|
||||
WAN["Internet / other LANs"] -.->|"router (out of scope)"| SW
|
||||
SW <-->|"Ethernet"| N
|
||||
SW <-->|"Ethernet"| A
|
||||
SW <-->|"Ethernet"| K
|
||||
SW <-->|"Ethernet"| H
|
||||
N --- VIP
|
||||
A --- VIP
|
||||
K --- VIP
|
||||
WK["Workstation / CI<br/>kubectl, browser"] -->|"HTTPS :6443"| VIP
|
||||
WK -->|"L2 (MetalLB .210–.211, any node)"| SW
|
||||
```
|
||||
|
||||
**Ingress path (same LAN):** clients → **`192.168.50.211`** (Traefik) or **`192.168.50.210`** (Argo CD) via **MetalLB** — still **through the same switch** to whichever node advertises the service.
|
||||
|
||||
---
|
||||
|
||||
## Related docs
|
||||
|
||||
- Cluster topology and services: [`architecture.md`](architecture.md)
|
||||
- Build state and versions: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
|
||||
@@ -8,8 +8,8 @@ This document describes the **noble** Talos lab cluster: node topology, networki
|
||||
|---------------|---------|
|
||||
| **Subgraph “Cluster”** | Kubernetes cluster boundary (`noble`) |
|
||||
| **External / DNS / cloud** | Services outside the data plane (internet, registrar, Pangolin) |
|
||||
| **Data store** | Durable data (etcd, Longhorn, Loki, Vault storage) |
|
||||
| **Secrets / policy** | Secret material, Vault, admission policy |
|
||||
| **Data store** | Durable data (etcd, Longhorn, Loki) |
|
||||
| **Secrets / policy** | Secret material (SOPS in git), admission policy |
|
||||
| **LB / VIP** | Load balancer, MetalLB assignment, or API VIP |
|
||||
|
||||
---
|
||||
@@ -74,7 +74,7 @@ flowchart TB
|
||||
|
||||
## Platform stack (bootstrap → workloads)
|
||||
|
||||
Order: **Talos** → **Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip** → **Traefik**, **cert-manager** → **Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `vault`, `external-secrets`, `sealed-secrets`, `kyverno`, `newt`, and others as deployed.
|
||||
Order: **Talos** → **Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip** → **Traefik**, **cert-manager** → **Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `kyverno`, `newt`, and others as deployed.
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
@@ -98,7 +98,7 @@ flowchart TB
|
||||
Argo["Argo CD<br/>(optional app-of-apps; platform via Ansible)"]
|
||||
end
|
||||
subgraph L5["Platform namespaces (examples)"]
|
||||
NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, vault, external-secrets, sealed-secrets,<br/>kyverno, newt, …"]
|
||||
NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, kyverno, newt, …"]
|
||||
end
|
||||
Talos --> Cilium --> MS
|
||||
Cilium --> LH
|
||||
@@ -149,22 +149,20 @@ flowchart LR
|
||||
|
||||
## Secrets and policy
|
||||
|
||||
**Sealed Secrets** decrypts `SealedSecret` objects in-cluster. **External Secrets Operator** syncs from **Vault** using **`ClusterSecretStore`** (see [`examples/vault-cluster-secret-store.yaml`](../clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml)). Trust is **cluster → Vault** (ESO calls Vault; Vault does not initiate cluster trust). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
|
||||
**Mozilla SOPS** with **age** encrypts plain Kubernetes **`Secret`** manifests under [`clusters/noble/secrets/`](../clusters/noble/secrets/); operators decrypt at apply time (`ansible/playbooks/noble.yml` or `sops -d … | kubectl apply`). The private key is **`age-key.txt`** at the repo root (gitignored). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Git["Git repo"]
|
||||
SSman["SealedSecret manifests<br/>(optional)"]
|
||||
SM["SOPS-encrypted Secret YAML<br/>clusters/noble/secrets/"]
|
||||
end
|
||||
subgraph ops["Apply path"]
|
||||
SOPS["sops -d + kubectl apply<br/>(or Ansible noble.yml)"]
|
||||
end
|
||||
subgraph cluster["Cluster"]
|
||||
SSC["Sealed Secrets controller<br/>sealed-secrets"]
|
||||
ESO["External Secrets Operator<br/>external-secrets"]
|
||||
V["Vault<br/>vault namespace<br/>HTTP listener"]
|
||||
K["Kyverno + kyverno-policies<br/>PSS baseline Audit"]
|
||||
end
|
||||
SSman -->|"encrypted"| SSC -->|"decrypt to Secret"| workloads["Workload Secrets"]
|
||||
ESO -->|"ClusterSecretStore →"| V
|
||||
ESO -->|"sync ExternalSecret"| workloads
|
||||
SM --> SOPS -->|"plain Secret"| workloads["Workload Secrets"]
|
||||
K -.->|"admission / audit<br/>(PSS baseline)"| workloads
|
||||
```
|
||||
|
||||
@@ -172,7 +170,7 @@ flowchart LR
|
||||
|
||||
## Data and storage
|
||||
|
||||
**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **Vault**, **kube-prometheus-stack** PVCs, and **Loki**.
|
||||
**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **kube-prometheus-stack** PVCs and **Loki**.
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
@@ -183,12 +181,10 @@ flowchart TB
|
||||
SC["StorageClass: longhorn (default)"]
|
||||
end
|
||||
subgraph consumers["Stateful / durable consumers"]
|
||||
V["Vault PVC data-vault-0"]
|
||||
PGL["kube-prometheus-stack PVCs"]
|
||||
L["Loki PVC"]
|
||||
end
|
||||
UD --> SC
|
||||
SC --> V
|
||||
SC --> PGL
|
||||
SC --> L
|
||||
```
|
||||
@@ -210,7 +206,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
|
||||
| Argo CD | 9.4.17 / app v3.3.6 |
|
||||
| kube-prometheus-stack | 82.15.1 |
|
||||
| Loki / Fluent Bit | 6.55.0 / 0.56.0 |
|
||||
| Sealed Secrets / ESO / Vault | 2.18.4 / 2.2.0 / 0.32.0 |
|
||||
| SOPS (client tooling) | see `clusters/noble/secrets/README.md` |
|
||||
| Kyverno | 3.7.1 / policies 3.7.1 |
|
||||
| Newt | 1.2.0 / app 1.10.1 |
|
||||
|
||||
@@ -218,7 +214,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
|
||||
|
||||
## Narrative
|
||||
|
||||
The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210`–`192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, Loki, and **Vault**. **Secrets** combine **Sealed Secrets** for git-encrypted material, **Vault** with **External Secrets** for dynamic sync, and **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
|
||||
The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210`–`192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible** for the **initial** platform install (**`clusters/noble/bootstrap/`**), then **Argo CD** for the kustomize tree (**`noble-bootstrap-root`** → **`clusters/noble/bootstrap`**) and optional apps (**`noble-root`** → **`clusters/noble/apps/`**) once automated sync is enabled after **`noble.yml`** (see **`clusters/noble/bootstrap/argocd/README.md`** §5). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, and Loki. **Secrets** in git use **SOPS** + **age** under **`clusters/noble/secrets/`**; **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
|
||||
|
||||
---
|
||||
|
||||
@@ -233,7 +229,7 @@ The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** wi
|
||||
**Open questions**
|
||||
|
||||
- **Split horizon:** Confirm whether only LAN DNS resolves `*.apps.noble.lab.pcenicni.dev` to **`192.168.50.211`** or whether public resolvers also point at that address.
|
||||
- **Velero / S3:** **TBD** until an S3-compatible backend is configured.
|
||||
- **Velero / S3:** optional **Ansible** install (**`noble_velero_install`**) from **`clusters/noble/bootstrap/velero/`** once an S3-compatible backend and credentials exist (see **`talos/CLUSTER-BUILD.md`** Phase F).
|
||||
- **Argo CD:** Confirm **`repoURL`** in `root-application.yaml` and what is actually applied on-cluster.
|
||||
|
||||
---
|
||||
|
||||
100
docs/homelab-network.md
Normal file
100
docs/homelab-network.md
Normal file
@@ -0,0 +1,100 @@
|
||||
# Homelab network inventory
|
||||
|
||||
Single place for **VLANs**, **static addressing**, and **hosts** beside the **noble** Talos cluster. **Proxmox** is the **hypervisor** for the VMs below; **all of those VMs are intended to run on `192.168.1.0/24`** (same broadcast domain as Pi-hole and typical home clients). **Noble** (Talos) stays on **`192.168.50.0/24`** per [`architecture.md`](architecture.md) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) until you change that design.
|
||||
|
||||
## VLANs (logical)
|
||||
|
||||
| Network | Role |
|
||||
|---------|------|
|
||||
| **`192.168.1.0/24`** | **Homelab / Proxmox LAN** — **Proxmox host(s)**, **all Proxmox VMs**, **Pi-hole**, **Mac mini**, and other servers that share this VLAN. |
|
||||
| **`192.168.50.0/24`** | **Noble Talos** cluster — physical nodes, **kube-vip**, **MetalLB**, Traefik; **not** the Proxmox VM subnet. |
|
||||
| **`192.168.60.0/24`** | **DMZ / WAN-facing** — **NPM**, **WebDAV**, **other services** that need WAN access. |
|
||||
| **`192.168.40.0/24`** | **Home Assistant** and IoT devices — isolated; record subnet and HA IP in DHCP/router. |
|
||||
|
||||
**Routing / DNS:** Clients and VMs on **`192.168.1.0/24`** reach **noble** services on **`192.168.50.0/24`** via **L3** (router/firewall). **NFS** from OMV (`192.168.1.105`) to **noble** pods uses the **OMV data IP** as the NFS server address from the cluster’s perspective.
|
||||
|
||||
Firewall rules between VLANs are **out of scope** here; document them where you keep runbooks.
|
||||
|
||||
---
|
||||
|
||||
## `192.168.50.0/24` — reservations (noble only)
|
||||
|
||||
Do not assign **unrelated** static services on **this** VLAN without checking overlap with MetalLB and kube-vip.
|
||||
|
||||
| Use | Addresses |
|
||||
|-----|-----------|
|
||||
| Talos nodes | `.10`–`.40` (see [`talos/talconfig.yaml`](../talos/talconfig.yaml)) |
|
||||
| MetalLB L2 pool | `.210`–`.229` |
|
||||
| Traefik (ingress) | `.211` (typical) |
|
||||
| Argo CD | `.210` (typical) |
|
||||
| Kubernetes API (kube-vip) | **`.230`** — **must not** be a VM |
|
||||
|
||||
---
|
||||
|
||||
## Proxmox VMs (`192.168.1.0/24`)
|
||||
|
||||
All run on **Proxmox**; addresses below use **`192.168.1.0/24`** (same host octet as your earlier `.50.x` / `.60.x` plan, moved into the homelab VLAN). Adjust if your router uses a different numbering scheme.
|
||||
|
||||
Most are **Docker hosts** with multiple apps; treat the **IP** as the **host**, not individual containers.
|
||||
|
||||
| VM ID | Name | IP | Notes |
|
||||
|-------|------|-----|--------|
|
||||
| 666 | nginxproxymanager | `192.168.1.20` | NPM (edge / WAN-facing role — firewall as you design). |
|
||||
| 777 | nginxproxymanager-Lan | `192.168.1.60` | NPM on **internal** homelab LAN. |
|
||||
| 100 | Openmediavault | `192.168.1.105` | **NFS** exports for *arr / media paths. |
|
||||
| 110 | Monitor | `192.168.1.110` | Uptime Kuma, Peekaping, Tracearr → cluster candidates. |
|
||||
| 120 | arr | `192.168.1.120` | *arr stack; media via **NFS** from OMV — see [migration](#arr-stack-nfs-and-kubernetes). |
|
||||
| 130 | Automate | `192.168.1.130` | Low use — **candidate to remove** or consolidate. |
|
||||
| 140 | general-purpose | `192.168.1.140` | IT tools, Mealie, Open WebUI, SparkyFitness, … |
|
||||
| 150 | Media-server | `192.168.1.150` | Jellyfin (test, **NFS** media), ebook server. |
|
||||
| 160 | s3 | `192.168.1.170` | Object storage; **merge** into **central S3** on noble per [`shared-data-services.md`](shared-data-services.md) when ready. |
|
||||
| 190 | Auth | `192.168.1.190` | **Authentik** → **noble (K8s)** for HA. |
|
||||
| 300 | gitea | `192.168.1.203` | On **`.1`**, no overlap with noble **MetalLB `.210`–`.229`** on **`.50`**. |
|
||||
| 310 | gitea-nsfw | `192.168.1.204` | |
|
||||
| 500 | AMP | `192.168.1.47` | |
|
||||
|
||||
### Workload detail (what runs where)
|
||||
|
||||
**Auth (190)** — **Authentik** is the main service; moving it to **Kubernetes (noble)** gives you **HA**, rolling upgrades, and backups via your cluster patterns (PVCs, Velero, etc.). Plan **OIDC redirect URLs** and **outposts** (if used) when the **ingress hostname** and paths to **`.50`** services change.
|
||||
|
||||
**Monitor (110)** — **Uptime Kuma**, **Peekaping**, and **Tracearr** are a good fit for the cluster: small state (SQLite or small DBs), **Ingress** via Traefik, and **Longhorn** or a small DB PVC. Migrate **one app at a time** and keep the old VM until DNS and alerts are verified.
|
||||
|
||||
**arr (120)** — **Lidarr, Sonarr, Radarr**, and related *arr* apps; libraries and download paths point at **NFS** from **Openmediavault (100)** at **`192.168.1.105`**. The hard part is **keeping paths, permissions (UID/GID), and download client** wiring while pods move.
|
||||
|
||||
**Automate (130)** — Tools are **barely used**; **decommission**, merge into **general-purpose (140)**, or replace with a **CronJob** / one-shot on the cluster only if something still needs scheduling.
|
||||
|
||||
**general-purpose (140)** — “Daily driver” stack: **IT tools**, **Mealie**, **Open WebUI**, **SparkyFitness**, and similar. **Candidates for gradual moves** to noble; group by **data sensitivity** and **persistence** (Postgres vs SQLite) when you pick order.
|
||||
|
||||
**Media-server (150)** — **Jellyfin** (testing) with libraries on **NFS**; **ebook** server. Treat **Jellyfin** like *arr* for storage: same NFS export and **transcoding** needs (CPU on worker nodes or GPU if you add it). Ebook stack depends on what you run (e.g. Kavita, Audiobookshelf) — note **metadata paths** before moving.
|
||||
|
||||
### Arr stack, NFS, and Kubernetes
|
||||
|
||||
You do **not** have to move NFS into the cluster: **Openmediavault** on **`192.168.1.105`** can stay the **NFS server** while the *arr* apps run as **Deployments** with **ReadWriteMany** volumes. Noble nodes on **`192.168.50.0/24`** mount NFS using **that IP** (ensure **firewall** allows **NFS** from node IPs to OMV).
|
||||
|
||||
1. **Keep OMV as the single source of exports** — same **export path** (e.g. `/export/media`) from the cluster’s perspective as from the current VM.
|
||||
2. **Mount NFS in Kubernetes** — use a **CSI NFS driver** (e.g. **nfs-subdir-external-provisioner** or **csi-driver-nfs**) so each app gets a **PVC** backed by a **subdirectory** of the export, **or** one shared RWX PVC for a common tree if your layout needs it.
|
||||
3. **Match POSIX ownership** — set **supplemental groups** or **fsGroup** / **runAsUser** on the pods so Sonarr/Radarr see the same **UID/GID** as today’s Docker setup; fix **squash** settings on OMV if you use `root_squash`.
|
||||
4. **Config and DB** — back up each app’s **config volume** (or SQLite files), redeploy with the same **environment**; point **download clients** and **NFS media roots** to the **same logical paths** inside the container.
|
||||
5. **Low-risk path** — run **one** *arr* app on the cluster while the rest stay on **VM 120** until imports and downloads behave; then cut DNS/NPM streams over.
|
||||
|
||||
If you prefer **no** NFS from pods, the alternative is **large ReadWriteOnce** disks on Longhorn and **sync** from OMV — usually **more** moving parts than **RWX NFS** for this workload class.
|
||||
|
||||
---
|
||||
|
||||
## Other hosts
|
||||
|
||||
| Host | IP | VLAN / network | Notes |
|
||||
|------|-----|----------------|--------|
|
||||
| **Pi-hole** | `192.168.1.127` | `192.168.1.0/24` | DNS; same VLAN as Proxmox VMs. |
|
||||
| **Home Assistant** | *TBD* | **IoT VLAN** | Add reservation when fixed. |
|
||||
| **Mac mini** | `192.168.1.155` | `192.168.1.0/24` | Align with **Storage B** in [`Racks.md`](Racks.md) if the same machine. |
|
||||
|
||||
---
|
||||
|
||||
## Related docs
|
||||
|
||||
- **Shared Postgres + S3 (centralized):** [`shared-data-services.md`](shared-data-services.md)
|
||||
- **VM → noble migration plan:** [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
|
||||
- Noble cluster topology and ingress: [`architecture.md`](architecture.md)
|
||||
- Physical racks (Primary / Storage B / Rack C): [`Racks.md`](Racks.md)
|
||||
- Cluster checklist: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
|
||||
121
docs/migration-vm-to-noble.md
Normal file
121
docs/migration-vm-to-noble.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Migration plan: Proxmox VMs → noble (Kubernetes)
|
||||
|
||||
This document is the **default playbook** for moving workloads from **Proxmox VMs** on **`192.168.1.0/24`** into the **noble** Talos cluster on **`192.168.50.0/24`**. Source inventory and per-VM notes: [`homelab-network.md`](homelab-network.md). Cluster facts: [`architecture.md`](architecture.md), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md).
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope and principles
|
||||
|
||||
| Principle | Detail |
|
||||
|-----------|--------|
|
||||
| **One service at a time** | Run the new workload on **noble** while the **VM** stays up; cut over **DNS / NPM** only after checks pass. |
|
||||
| **Same container image** | Prefer the **same** upstream image and major version as Docker on the VM to reduce surprises. |
|
||||
| **Data moves with a plan** | **Backup** VM volumes or export DB dumps **before** the first deploy to the cluster. |
|
||||
| **Ingress on noble** | Internal apps use **Traefik** + **`*.apps.noble.lab.pcenicni.dev`** (or your chosen hostnames) and **MetalLB** (e.g. **`192.168.50.211`**) per [`architecture.md`](architecture.md). |
|
||||
| **Cross-VLAN** | Clients on **`.1`** reach services on **`.50`** via **routing**; **firewall** must allow **NFS** from **Talos node IPs** to **OMV `192.168.1.105`** when pods mount NFS. |
|
||||
|
||||
**Not everything must move.** Keep **Openmediavault** (and optionally **NPM**) on VMs if you prefer; the cluster consumes **NFS** and **HTTP** from them.
|
||||
|
||||
---
|
||||
|
||||
## 2. Prerequisites (before wave 1)
|
||||
|
||||
1. **Cluster healthy** — `kubectl get nodes`; [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) checklist through ingress and cert-manager as needed.
|
||||
2. **Ingress + TLS** — **Traefik** + **cert-manager** working; you can hit a **test Ingress** on the MetalLB IP.
|
||||
3. **GitOps / deploy path** — Decide per app: **Helm** under `clusters/noble/apps/`, **Argo CD**, or **Ansible**-applied manifests (match how you manage the rest of noble).
|
||||
4. **Secrets** — Plan **Kubernetes Secrets**; for git-stored material, align with **SOPS** (`clusters/noble/secrets/`, `.sops.yaml`).
|
||||
5. **Storage** — **Longhorn** default for **ReadWriteOnce** state; for **NFS** (*arr*, Jellyfin), install a **CSI NFS** driver and test a **small RWX PVC** before migrating data-heavy apps.
|
||||
6. **Shared data tier (recommended)** — Deploy **centralized PostgreSQL** and **S3-compatible storage** on noble so apps do not each ship their own DB/object store; see [`shared-data-services.md`](shared-data-services.md).
|
||||
7. **Firewall** — Rules: **workstation → `192.168.50.230:6443`**; **nodes → OMV NFS ports**; **clients → `192.168.50.211`** (or split-horizon DNS) as you design.
|
||||
8. **DNS** — Split-horizon or Pi-hole records for **`*.apps.noble.lab.pcenicni.dev`** → **Traefik** IP **`192.168.50.211`** for LAN clients.
|
||||
|
||||
---
|
||||
|
||||
## 3. Standard migration procedure (repeat per app)
|
||||
|
||||
Use this checklist for **each** application (or small group, e.g. one Helm release).
|
||||
|
||||
| Step | Action |
|
||||
|------|--------|
|
||||
| **A. Discover** | Document **image:tag**, **ports**, **volumes** (host paths), **env vars**, **depends_on** (DB, Redis, NFS path). Export **docker inspect** / **compose** from the VM. |
|
||||
| **B. Backup** | Snapshot **Proxmox VM** or backup **volume** / **SQLite** / **DB dump** to offline storage. |
|
||||
| **C. Namespace** | Create a **dedicated namespace** (e.g. `monitoring-tools`, `authentik`) or use your house standard. |
|
||||
| **D. Deploy** | Add **Deployment** (or **StatefulSet**), **Service**, **Ingress** (class **traefik**), **PVCs**; wire **secrets** from **Secrets** (not literals in git). |
|
||||
| **E. Storage** | **Longhorn** PVC for local state; **NFS CSI** PVC for shared media/config paths that must match the VM (see [`homelab-network.md`](homelab-network.md) *arr* section). Prefer **shared Postgres** / **shared S3** per [`shared-data-services.md`](shared-data-services.md) instead of new embedded databases. Match **UID/GID** with `securityContext`. |
|
||||
| **F. Smoke test** | `kubectl port-forward` or temporary **Ingress** hostname; log in, run one critical workflow (login, playback, sync). |
|
||||
| **G. DNS cutover** | Point **internal DNS** or **NPM** upstream from the **VM IP** to the **new hostname** (Traefik) or **MetalLB IP** + Host header. |
|
||||
| **H. Observe** | 24–72 hours: logs, alerts, **Uptime Kuma** (once migrated), backups. |
|
||||
| **I. Decommission** | Stop the **container** on the VM (not the whole VM until the **whole** VM is empty). |
|
||||
| **J. VM off** | When **no** services remain on that VM, **power off** and archive or delete the VM. |
|
||||
|
||||
**Rollback:** Re-enable the VM service, revert **DNS/NPM** to the old IP, delete or scale the cluster deployment to zero.
|
||||
|
||||
---
|
||||
|
||||
## 4. Recommended migration order (phases)
|
||||
|
||||
Order balances **risk**, **dependencies**, and **learning curve**.
|
||||
|
||||
| Phase | Target | Rationale |
|
||||
|-------|--------|-----------|
|
||||
| **0 — Optional** | **Automate (130)** | Low use: **retire** or replace with **CronJobs**; skip if nothing valuable runs. |
|
||||
| **0b — Platform** | **Shared Postgres + S3** on noble | Run **before** or alongside early waves so new deploys use **one DSN** and **one object endpoint**; retire **VM 160** when empty. See [`shared-data-services.md`](shared-data-services.md). |
|
||||
| **1 — Observability** | **Monitor (110)** — Uptime Kuma, Peekaping, Tracearr | Small state, validates **Ingress**, **PVCs**, and **alert paths** before auth and media. |
|
||||
| **2 — Git** | **gitea (300)**, **gitea-nsfw (310)** | Point at **shared Postgres** + **S3** for attachments; move **repos** with **PVC** + backup restore if needed. |
|
||||
| **3 — Object / misc** | **s3 (160)**, **AMP (500)** | **Migrate data** into **central** S3 on cluster, then **decommission** duplicate MinIO on VM **160** if applicable. |
|
||||
| **4 — Auth** | **Auth (190)** — **Authentik** | Use **shared Postgres**; update **all OIDC clients** (Gitea, apps, NPM) with **new issuer URLs**; schedule a **maintenance window**. |
|
||||
| **5 — Daily apps** | **general-purpose (140)** | Move **one app per release** (Mealie, Open WebUI, …); each app gets its **own database** (and bucket if needed) on the **shared** tiers — not a new Postgres pod per app. |
|
||||
| **6 — Media / *arr*** | **arr (120)**, **Media-server (150)** | **NFS** from **OMV**, download clients, **transcoding** — migrate **one *arr*** then Jellyfin/ebook; see NFS bullets in [`homelab-network.md`](homelab-network.md). |
|
||||
| **7 — Edge** | **NPM (666/777)** | Often **last**: either keep on Proxmox or replace with **Traefik** + **IngressRoutes** / **Gateway API**; many people keep a **dedicated** reverse proxy VM until parity is proven. |
|
||||
|
||||
**Openmediavault (100)** — Typically **stays** as **NFS** (and maybe backup target) for the cluster; no need to “migrate” the whole NAS into Kubernetes.
|
||||
|
||||
---
|
||||
|
||||
## 5. Ingress and reverse proxy
|
||||
|
||||
| Approach | When to use |
|
||||
|----------|-------------|
|
||||
| **Traefik Ingress** on noble | Default for **internal** HTTPS apps; **cert-manager** for public names you control. |
|
||||
| **NPM (VM)** as front door | Point **proxy host** → **Traefik MetalLB IP** or **service name** if you add internal DNS; reduces double-proxy if you **terminate TLS** in one place only. |
|
||||
| **Newt / Pangolin** | Public reachability per [`clusters/noble/bootstrap/newt/README.md`](../clusters/noble/bootstrap/newt/README.md); not automatic ExternalDNS. |
|
||||
|
||||
Avoid **two** TLS terminations for the same hostname unless you intend **SSL passthrough** end-to-end.
|
||||
|
||||
---
|
||||
|
||||
## 6. Authentik-specific (Auth VM → cluster)
|
||||
|
||||
1. **Backup** Authentik **PostgreSQL** (or embedded DB) and **media** volume from the VM.
|
||||
2. Deploy **Helm** (official chart) with **same** Authentik version if possible.
|
||||
3. **Restore** DB into **shared cluster Postgres** (recommended) or chart-managed DB — see [`shared-data-services.md`](shared-data-services.md).
|
||||
4. Update **issuer URL** in every **OIDC/OAuth** client (Gitea, Grafana, etc.).
|
||||
5. Re-test **outposts** (if any) and **redirect URIs** from both **`.1`** and **`.50`** client perspectives.
|
||||
6. **Cut over DNS**; then **decommission** VM **190**.
|
||||
|
||||
---
|
||||
|
||||
## 7. *arr* and Jellyfin-specific
|
||||
|
||||
Follow the **numbered list** under **“Arr stack, NFS, and Kubernetes”** in [`homelab-network.md`](homelab-network.md). In short: **OMV stays**; **CSI NFS** + **RWX**; **match permissions**; migrate **one app** first; verify **download client** can reach the new pod **IP/DNS** from your download host.
|
||||
|
||||
---
|
||||
|
||||
## 8. Validation checklist (per wave)
|
||||
|
||||
- Pods **Ready**, **Ingress** returns **200** / login page.
|
||||
- **TLS** valid for chosen hostname.
|
||||
- **Persistent data** present (new uploads, DB writes survive pod restart).
|
||||
- **Backups** (Velero or app-level) defined for the new location.
|
||||
- **Monitoring** / alerts updated (targets, not old VM IP).
|
||||
- **Documentation** in [`homelab-network.md`](homelab-network.md) updated (VM retired or marked migrated).
|
||||
|
||||
---
|
||||
|
||||
## Related docs
|
||||
|
||||
- **Shared Postgres + S3:** [`shared-data-services.md`](shared-data-services.md)
|
||||
- VM inventory and NFS notes: [`homelab-network.md`](homelab-network.md)
|
||||
- Noble topology, MetalLB, Traefik: [`architecture.md`](architecture.md)
|
||||
- Bootstrap and versions: [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
|
||||
- Apps layout: [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md)
|
||||
90
docs/shared-data-services.md
Normal file
90
docs/shared-data-services.md
Normal file
@@ -0,0 +1,90 @@
|
||||
# Centralized PostgreSQL and S3-compatible storage
|
||||
|
||||
Goal: **one shared PostgreSQL** and **one S3-compatible object store** on **noble**, instead of every app bundling its own database or MinIO. Apps keep **logical isolation** via **per-app databases** / **users** and **per-app buckets** (or prefixes), not separate clusters.
|
||||
|
||||
See also: [`migration-vm-to-noble.md`](migration-vm-to-noble.md), [`homelab-network.md`](homelab-network.md) (VM **160** `s3` today), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) (Velero + S3).
|
||||
|
||||
---
|
||||
|
||||
## 1. Why centralize
|
||||
|
||||
| Benefit | Detail |
|
||||
|--------|--------|
|
||||
| **Operations** | One backup/restore story, one upgrade cadence, one place to tune **IOPS** and **retention**. |
|
||||
| **Security** | **Least privilege**: each app gets its own **DB user** and **S3 credentials** scoped to one database or bucket. |
|
||||
| **Resources** | Fewer duplicate **Postgres** or **MinIO** sidecars; better use of **Longhorn** or dedicated PVCs for the shared tiers. |
|
||||
|
||||
**Tradeoff:** Shared tiers are **blast-radius** targets — use **backups**, **PITR** where you care, and **NetworkPolicies** so only expected namespaces talk to Postgres/S3.
|
||||
|
||||
---
|
||||
|
||||
## 2. PostgreSQL — recommended pattern
|
||||
|
||||
1. **Run Postgres on noble** — Operators such as **CloudNativePG**, **Zalando Postgres operator**, or a well-maintained **Helm** chart with **replicas** + **persistent volumes** (Longhorn).
|
||||
2. **One cluster instance, many databases** — For each app: `CREATE DATABASE appname;` and a **dedicated role** with `CONNECT` on that database only (not superuser).
|
||||
3. **Connection from apps** — Use a **Kubernetes Service** (e.g. `postgres-platform.platform.svc.cluster.local:5432`) and pass **credentials** via **Secrets** (ideally **SOPS**-encrypted in git).
|
||||
4. **Migrations** — Run app **migration** jobs or init containers against the **same** DSN after DB exists.
|
||||
|
||||
**Migrating off SQLite / embedded Postgres**
|
||||
|
||||
- **SQLite → Postgres:** export/import per app (native tools, or **pgloader** where appropriate).
|
||||
- **Docker Postgres volume:** `pg_dumpall` or per-DB `pg_dump` → restore into a **new** database on the shared server; **freeze writes** during cutover.
|
||||
|
||||
---
|
||||
|
||||
## 3. S3-compatible object storage — recommended pattern
|
||||
|
||||
1. **Run one S3 API on noble** — **MinIO** (common), **Garage**, or **SeaweedFS** S3 layer — with **PVC(s)** or host path for data; **erasure coding** / replicas if the chart supports it and you want durability across nodes.
|
||||
2. **Buckets per concern** — e.g. `gitea-attachments`, `velero`, `loki-archive` — not one global bucket unless you enforce **prefix** IAM policies.
|
||||
3. **Credentials** — **IAM-style** users limited to **one bucket** (or prefix); **Secrets** reference **access key** / **secret**; never commit keys in plain text.
|
||||
4. **Endpoint for pods** — In-cluster: `http://minio.platform.svc.cluster.local:9000` (or TLS inside mesh). Apps use **virtual-hosted** or **path-style** per SDK defaults.
|
||||
|
||||
### NFS as backing store for S3 on noble
|
||||
|
||||
**Yes.** You can run MinIO (or another S3-compatible server) with its **data directory** on a **ReadWriteMany** volume that is **NFS** — for example the same **Openmediavault** export you already use, mounted via your **NFS CSI** driver (see [`homelab-network.md`](homelab-network.md)).
|
||||
|
||||
| Consideration | Detail |
|
||||
|---------------|--------|
|
||||
| **Works for homelab** | MinIO stores objects as files under a path; **POSIX** on NFS is enough for many setups. |
|
||||
| **Performance** | NFS adds **latency** and shared bandwidth; fine for moderate use, less ideal for heavy multi-tenant throughput. |
|
||||
| **Availability** | The **NFS server** (OMV) becomes part of the availability story for object data — plan **backups** and **OMV** health like any dependency. |
|
||||
| **Locking / semantics** | Prefer **NFSv4.x**; avoid mixing **NFS** and expectations of **local SSD** (e.g. very chatty small writes). If you see odd behavior, **Longhorn** (block) on a node is the usual next step. |
|
||||
| **Layering** | You are stacking **S3 API → file layout → NFS → disk**; that is normal for a lab, just **monitor** space and exports on OMV. |
|
||||
|
||||
**Summary:** NFS-backed PVC for MinIO is **valid** on noble; use **Longhorn** (or local disk) when you need **better IOPS** or want object data **inside** the cluster’s storage domain without depending on OMV for that tier.
|
||||
|
||||
**Migrating off VM 160 (`s3`) or per-app MinIO**
|
||||
|
||||
- **MinIO → MinIO:** `mc mirror` between aliases, or **replication** if you configure it.
|
||||
- **Same API:** Any tool speaking **S3** can **sync** buckets before you point apps at the new endpoint.
|
||||
|
||||
**Velero** — Point the **backup location** at the **central** bucket (see cluster Velero docs); avoid a second ad-hoc object store for backups if one cluster bucket is enough.
|
||||
|
||||
---
|
||||
|
||||
## 4. Ordering relative to app migrations
|
||||
|
||||
| When | What |
|
||||
|------|------|
|
||||
| **Early** | Stand up **Postgres** + **S3** with **empty** DBs/buckets; test with **one** non-critical app (e.g. a throwaway deployment). |
|
||||
| **Before auth / Git** | **Gitea** and **Authentik** benefit from **managed Postgres** early — plan **DSN** and **bucket** for attachments **before** cutover. |
|
||||
| **Ongoing** | New apps **must not** ship embedded **Postgres/MinIO** unless the workload truly requires it (e.g. vendor appliance). |
|
||||
|
||||
---
|
||||
|
||||
## 5. Checklist (platform team)
|
||||
|
||||
- [ ] Postgres **Service** DNS name and **TLS** (optional in-cluster) documented.
|
||||
- [ ] S3 **endpoint**, **region** string (can be `us-east-1` for MinIO), **TLS** for Ingress if clients are outside the cluster.
|
||||
- [ ] **Backup:** scheduled **logical dumps** (Postgres) and **bucket replication** or **object versioning** where needed.
|
||||
- [ ] **SOPS** / **External Secrets** pattern for **rotation** without editing app manifests by hand.
|
||||
- [ ] **homelab-network.md** updated when **VM 160** is retired or repurposed.
|
||||
|
||||
---
|
||||
|
||||
## Related docs
|
||||
|
||||
- VM → cluster migration: [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
|
||||
- Inventory (s3 VM): [`homelab-network.md`](homelab-network.md)
|
||||
- Longhorn / storage runbook: [`../talos/runbooks/longhorn.md`](../talos/runbooks/longhorn.md)
|
||||
- Velero (S3 backup target): [`../clusters/noble/bootstrap/velero/`](../clusters/noble/bootstrap/velero/) (if present)
|
||||
@@ -7,16 +7,17 @@
|
||||
|
||||
services:
|
||||
tracearr:
|
||||
image: ghcr.io/connorgallopo/tracearr:supervised-nightly
|
||||
image: ghcr.io/connorgallopo/tracearr:supervised
|
||||
shm_size: 256mb # Required for PostgreSQL shared memory
|
||||
ports:
|
||||
- "${PORT:-3000}:3000"
|
||||
environment:
|
||||
- NODE_ENV=production
|
||||
- PORT=3000
|
||||
- HOST=0.0.0.0
|
||||
- TZ=${TZ:-UTC}
|
||||
- CORS_ORIGIN=${CORS_ORIGIN:-*}
|
||||
- LOG_LEVEL=${LOG_LEVEL:-info}
|
||||
# Optional: Override auto-generated secrets
|
||||
# - JWT_SECRET=${JWT_SECRET}
|
||||
# - COOKIE_SECRET=${COOKIE_SECRET}
|
||||
volumes:
|
||||
- tracearr_postgres:/data/postgres
|
||||
- tracearr_redis:/data/redis
|
||||
|
||||
37
komodo/s3/versitygw/.env.sample
Normal file
37
komodo/s3/versitygw/.env.sample
Normal file
@@ -0,0 +1,37 @@
|
||||
# Versity S3 Gateway — root credentials for the flat-file IAM backend.
|
||||
# https://github.com/versity/versitygw/wiki/Quickstart
|
||||
#
|
||||
# Local: copy to `.env` next to compose.yaml (or set `run_directory` to this folder
|
||||
# in Komodo) so `docker compose` can interpolate `${ROOT_ACCESS_KEY}` etc.
|
||||
#
|
||||
# Komodo: Stack Environment is written to `<run_directory>/.env` and passed as
|
||||
# `--env-file` — that drives `${VAR}` in compose.yaml. Set **one** pair using exact
|
||||
# names (leave the other pair unset / empty):
|
||||
# ROOT_ACCESS_KEY + ROOT_SECRET_KEY
|
||||
# ROOT_ACCESS_KEY_ID + ROOT_SECRET_ACCESS_KEY (Helm-style)
|
||||
|
||||
ROOT_ACCESS_KEY=
|
||||
ROOT_SECRET_KEY=
|
||||
# ROOT_ACCESS_KEY_ID=
|
||||
# ROOT_SECRET_ACCESS_KEY=
|
||||
|
||||
# Host port mapped to the gateway (container listens on 10000).
|
||||
VERSITYGW_PORT=10000
|
||||
|
||||
# WebUI (container listens on 8080). In Pangolin, create a *second* HTTP resource for this
|
||||
# port — do not point the UI hostname at :10000 (that is S3 API only; `/` is not the SPA).
|
||||
VERSITYGW_WEBUI_PORT=8080
|
||||
# HTTPS URL of the *S3 API* (Pangolin resource → host :10000). **Not** the WebUI URL.
|
||||
# No trailing slash. Wrong value → WebUI calls the wrong host and bucket create can 404.
|
||||
# VGW_WEBUI_GATEWAYS=https://s3.example.com
|
||||
VGW_WEBUI_GATEWAYS=
|
||||
|
||||
# Public origin of the **WebUI** page (Pangolin → :8080), e.g. https://s3-ui.example.com
|
||||
# Required when UI and API are on different hosts so the browser can call the API (CORS).
|
||||
# VGW_CORS_ALLOW_ORIGIN=https://s3-ui.example.com
|
||||
VGW_CORS_ALLOW_ORIGIN=
|
||||
|
||||
# NFS: object metadata defaults to xattrs; most NFS mounts need sidecar mode
|
||||
# (compose.yaml uses --sidecar /data/sidecar). Create the host path, e.g.
|
||||
# mkdir -p /mnt/nfs/versity/sidecar
|
||||
# Or use NFSv4.2 with xattr support and remove --sidecar from compose if you prefer.
|
||||
64
komodo/s3/versitygw/compose.yaml
Normal file
64
komodo/s3/versitygw/compose.yaml
Normal file
@@ -0,0 +1,64 @@
|
||||
# Versity S3 Gateway — POSIX backend over Docker volumes.
|
||||
# https://github.com/versity/versitygw
|
||||
#
|
||||
# POSIX default metadata uses xattrs; NFS often lacks xattr support unless NFSv4.2
|
||||
# + client/server support. `--sidecar` stores metadata in files instead (see
|
||||
# `posix` flags / VGW_META_SIDECAR in cmd/versitygw/posix.go).
|
||||
services:
|
||||
versitygw:
|
||||
image: versity/versitygw:v1.3.1
|
||||
container_name: versitygw
|
||||
restart: unless-stopped
|
||||
# Credentials: use `${VAR}` so values come from the same env Komodo passes with
|
||||
# `docker compose --env-file <run_directory>/.env` (see Komodo Stack docs).
|
||||
# Do NOT use `env_file: .env` here: that path is resolved next to *this* compose
|
||||
# file, while Komodo writes `.env` under `run_directory` — they often differ
|
||||
# (e.g. run_directory = repo root, compose in komodo/s3/versitygw/).
|
||||
environment:
|
||||
ROOT_ACCESS_KEY: ${ROOT_ACCESS_KEY}
|
||||
ROOT_SECRET_KEY: ${ROOT_SECRET_KEY}
|
||||
ROOT_ACCESS_KEY_ID: ${ROOT_ACCESS_KEY_ID}
|
||||
ROOT_SECRET_ACCESS_KEY: ${ROOT_SECRET_ACCESS_KEY}
|
||||
# Matches Helm chart default; enables `/_/health` for probes.
|
||||
VGW_HEALTH: /_/health
|
||||
# WebUI (browser): separate listener; TLS terminates at Pangolin — serve HTTP in-container.
|
||||
VGW_WEBUI_NO_TLS: "true"
|
||||
# Public base URL of the *S3 API* only (Pangolin → :10000). Not the WebUI hostname.
|
||||
# No trailing slash. If this points at the UI URL, bucket ops return 404/wrong host.
|
||||
VGW_WEBUI_GATEWAYS: ${VGW_WEBUI_GATEWAYS}
|
||||
# Browser Origin when WebUI and API use different HTTPS hostnames (see wiki / WebGUI CORS).
|
||||
VGW_CORS_ALLOW_ORIGIN: ${VGW_CORS_ALLOW_ORIGIN}
|
||||
ports:
|
||||
- "${VERSITYGW_PORT:-10000}:10000"
|
||||
- "${VERSITYGW_WEBUI_PORT:-8080}:8080"
|
||||
volumes:
|
||||
- /mnt/nfs/versity/s3:/data/s3
|
||||
- /mnt/nfs/versity/iam:/data/iam
|
||||
- /mnt/nfs/versity/versions:/data/versions
|
||||
- /mnt/nfs/versity/sidecar:/data/sidecar
|
||||
command:
|
||||
- "--port"
|
||||
- ":10000"
|
||||
# Optional WebUI — without this, only the S3 API is served (browsers often see 404 on `/`).
|
||||
- "--webui"
|
||||
- ":8080"
|
||||
- "--iam-dir"
|
||||
- "/data/iam"
|
||||
- "posix"
|
||||
- "--sidecar"
|
||||
- "/data/sidecar"
|
||||
- "--versioning-dir"
|
||||
- "/data/versions"
|
||||
- "/data/s3"
|
||||
healthcheck:
|
||||
test:
|
||||
[
|
||||
"CMD",
|
||||
"wget",
|
||||
"-qO-",
|
||||
"http://127.0.0.1:10000/_/health",
|
||||
]
|
||||
interval: 30s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
@@ -4,24 +4,24 @@ This document is the **exported TODO** for the **noble** Talos cluster (4 nodes)
|
||||
|
||||
## Current state (2026-03-28)
|
||||
|
||||
Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vault **CiliumNetworkPolicy**, **`talos/runbooks/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** when S3 exists.
|
||||
Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (**`talos/runbooks/`**, **SOPS**-encrypted secrets in **`clusters/noble/secrets/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** backup/restore drill after S3 credentials are set (**`noble_velero_install`**).
|
||||
|
||||
- **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
|
||||
- **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/bootstrap/cilium/`, phase 1 values).
|
||||
- **CSI Volume Snapshot** — **external-snapshotter** **v8.5.0** CRDs + **`registry.k8s.io/sig-storage/snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**).
|
||||
- **MetalLB** Helm **0.15.3** / app **v0.15.3**; **IPAddressPool** `noble-l2` + **L2Advertisement** — pool **`192.168.50.210`–`192.168.50.229`**.
|
||||
- **kube-vip** DaemonSet **3/3** on control planes; VIP **`192.168.50.230`** on **`ens18`** (`vip_subnet` **`/32`** required — bare **`32`** breaks parsing). **Verified from workstation:** `kubectl config set-cluster noble --server=https://192.168.50.230:6443` then **`kubectl get --raw /healthz`** → **`ok`** (`talos/kubeconfig`; see `talos/README.md`).
|
||||
- **metrics-server** Helm **3.13.0** / app **v0.8.0** — `clusters/noble/bootstrap/metrics-server/values.yaml` (`--kubelet-insecure-tls` for Talos); **`kubectl top nodes`** works.
|
||||
- **Longhorn** Helm **1.11.1** / app **v1.11.1** — `clusters/noble/bootstrap/longhorn/` (PSA **privileged** namespace, `defaultDataPath` `/var/mnt/longhorn`, `preUpgradeChecker` enabled); **StorageClass** `longhorn` (default); **`nodes.longhorn.io`** all **Ready**; test **PVC** `Bound` on `longhorn`.
|
||||
- **Traefik** Helm **39.0.6** / app **v3.6.11** — `clusters/noble/bootstrap/traefik/`; **`Service`** **`LoadBalancer`** **`EXTERNAL-IP` `192.168.50.211`**; **`IngressClass`** **`traefik`** (default). Point **`*.apps.noble.lab.pcenicni.dev`** at **`192.168.50.211`**. MetalLB pool verification was done before replacing the temporary nginx test with Traefik.
|
||||
- **cert-manager** Helm **v1.20.0** / app **v1.20.0** — `clusters/noble/bootstrap/cert-manager/`; **`ClusterIssuer`** **`letsencrypt-staging`** and **`letsencrypt-prod`** (**DNS-01** via **Cloudflare** for **`pcenicni.dev`**, Secret **`cloudflare-dns-api-token`** in **`cert-manager`**); ACME email **`certificates@noble.lab.pcenicni.dev`** (edit in manifests if you want a different mailbox).
|
||||
- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Prefer a **SealedSecret** in git (`kubeseal` — see `clusters/noble/bootstrap/sealed-secrets/examples/`) after rotating credentials if they were exposed. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
|
||||
- **Argo CD** Helm **9.4.17** / app **v3.3.6** — `clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; app-of-apps root syncs **`clusters/noble/apps/`** (edit **`root-application.yaml`** `repoURL` before applying).
|
||||
- **Newt** Helm **1.2.0** / app **1.10.1** — `clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Store credentials in git with **SOPS** (`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`, **`age-key.txt`**, **`.sops.yaml`**) — see **`clusters/noble/secrets/README.md`**. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolin’s domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
|
||||
- **Argo CD** Helm **9.4.17** / app **v3.3.6** — `clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; **`noble-root`** → **`clusters/noble/apps/`**; **`noble-bootstrap-root`** → **`clusters/noble/bootstrap`** (manual sync until **`argocd/README.md`** §5 after **`noble.yml`**). Edit **`repoURL`** in both root **`Application`** files before applying.
|
||||
- **kube-prometheus-stack** — Helm chart **82.15.1** — `clusters/noble/bootstrap/kube-prometheus-stack/` (**namespace** `monitoring`, PSA **privileged** — **node-exporter** needs host mounts); **Longhorn** PVCs for Prometheus, Grafana, Alertmanager; **node-exporter** DaemonSet **4/4**. **Grafana Ingress:** **`https://grafana.apps.noble.lab.pcenicni.dev`** (Traefik **`ingressClassName: traefik`**, **`cert-manager.io/cluster-issuer: letsencrypt-prod`**). **Loki** datasource in Grafana: ConfigMap **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** (sidecar label **`grafana_datasource: "1"`**) — not via **`grafana.additionalDataSources`** in the chart. **`helm upgrade --install` with `--wait` is silent until done** — use **`--timeout 30m`**; Grafana admin: Secret **`kube-prometheus-grafana`**, keys **`admin-user`** / **`admin-password`**.
|
||||
- **Loki** + **Fluent Bit** — **`grafana/loki` 6.55.0** SingleBinary + **filesystem** on **Longhorn** (`clusters/noble/bootstrap/loki/`); **`loki.auth_enabled: false`**; **`chunksCache.enabled: false`** (no memcached chunk cache). **`fluent/fluent-bit` 0.56.0** → **`loki-gateway.loki.svc:80`** (`clusters/noble/bootstrap/fluent-bit/`); **`logging`** PSA **privileged**. **Grafana Explore:** **`kubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** then **Explore → Loki** (e.g. `{job="fluent-bit"}`).
|
||||
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1** — `clusters/noble/bootstrap/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
|
||||
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0** — `clusters/noble/bootstrap/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
|
||||
- **Vault** Helm **0.32.0** / app **1.21.2** — `clusters/noble/bootstrap/vault/` — standalone **file** storage, **Longhorn** PVC; **HTTP** listener (`global.tlsDisable`); optional **CronJob** lab unseal **`unseal-cronjob.yaml`**; **not** initialized in git — run **`vault operator init`** per **`README.md`**.
|
||||
- **Still open:** **Renovate** — install **[Mend Renovate](https://github.com/apps/renovate)** (or self-host) so PRs run; optional **Alertmanager** notification channels; optional **sample Ingress + cert + Pangolin** end-to-end; **Velero** when S3 is ready; **Argo CD SSO**.
|
||||
- **SOPS** — cluster **`Secret`** manifests under **`clusters/noble/secrets/`** encrypted with **age** (see **`.sops.yaml`**, **`age-key.txt`** gitignored); **`noble.yml`** decrypt-applies when the private key is present.
|
||||
- **Velero** Helm **12.0.0** / app **v1.18.0** — `clusters/noble/bootstrap/velero/` (**Ansible** **`noble_velero`**, not Argo); **S3-compatible** backup location + **CSI** snapshots (**`EnableCSI`**); enable with **`noble_velero_install`** per **`velero/README.md`**.
|
||||
- **Still open:** **Renovate** — install **[Mend Renovate](https://github.com/apps/renovate)** (or self-host) so PRs run; optional **Alertmanager** notification channels; optional **sample Ingress + cert + Pangolin** end-to-end; **Argo CD SSO**.
|
||||
|
||||
## Inventory
|
||||
|
||||
@@ -44,7 +44,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
||||
| Grafana (Ingress + TLS) | **`grafana.apps.noble.lab.pcenicni.dev`** — `grafana.ingress` in `clusters/noble/bootstrap/kube-prometheus-stack/values.yaml` (**`letsencrypt-prod`**) |
|
||||
| Headlamp (Ingress + TLS) | **`headlamp.apps.noble.lab.pcenicni.dev`** — chart `ingress` in `clusters/noble/bootstrap/headlamp/` (**`letsencrypt-prod`**, **`ingressClassName: traefik`**) |
|
||||
| Public DNS (Pangolin) | **Newt** tunnel + **CNAME** at registrar + **Integration API** — `clusters/noble/bootstrap/newt/` |
|
||||
| Velero | S3-compatible URL — configure later |
|
||||
| Velero | S3-compatible endpoint + bucket — **`clusters/noble/bootstrap/velero/`**, **`ansible/playbooks/noble.yml`** (**`noble_velero_install`**) |
|
||||
|
||||
## Versions
|
||||
|
||||
@@ -62,11 +62,9 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
||||
- kube-prometheus-stack: **82.15.1** (Helm chart `prometheus-community/kube-prometheus-stack`; app **v0.89.x** bundle)
|
||||
- Loki: **6.55.0** (Helm chart `grafana/loki`; app **3.6.7**)
|
||||
- Fluent Bit: **0.56.0** (Helm chart `fluent/fluent-bit`; app **4.2.3**)
|
||||
- Sealed Secrets: **2.18.4** (Helm chart `sealed-secrets/sealed-secrets`; app **0.36.1**)
|
||||
- External Secrets Operator: **2.2.0** (Helm chart `external-secrets/external-secrets`; app **v2.2.0**)
|
||||
- Vault: **0.32.0** (Helm chart `hashicorp/vault`; app **1.21.2**)
|
||||
- Kyverno: **3.7.1** (Helm chart `kyverno/kyverno`; app **v1.17.1**); **kyverno-policies** **3.7.1** — **baseline** PSS, **Audit** (`clusters/noble/bootstrap/kyverno/`)
|
||||
- Headlamp: **0.40.1** (Helm chart `headlamp/headlamp`; app matches chart — see [Artifact Hub](https://artifacthub.io/packages/helm/headlamp/headlamp))
|
||||
- Velero: **12.0.0** (Helm chart `vmware-tanzu/velero`; app **v1.18.0**) — **`clusters/noble/bootstrap/velero/`**; AWS plugin **v1.14.0**; Ansible **`noble_velero`**
|
||||
- Renovate: **hosted** (Mend **Renovate** GitHub/GitLab app — no cluster chart) **or** **self-hosted** — pin chart when added ([Helm charts](https://docs.renovatebot.com/helm-charts/), OCI `ghcr.io/renovatebot/charts/renovate`); pair **`renovate.json`** with this repo’s Helm paths under **`clusters/noble/`**
|
||||
|
||||
## Repo paths (this workspace)
|
||||
@@ -74,30 +72,30 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
||||
| Artifact | Path |
|
||||
|----------|------|
|
||||
| This checklist | `talos/CLUSTER-BUILD.md` |
|
||||
| Operational runbooks (API VIP, etcd, Longhorn, Vault) | `talos/runbooks/` |
|
||||
| Operational runbooks (API VIP, etcd, Longhorn, SOPS) | `talos/runbooks/` |
|
||||
| Talos quick start + networking + kubeconfig | `talos/README.md` |
|
||||
| talhelper source (active) | `talos/talconfig.yaml` — may be **wipe-phase** (no Longhorn volume) during disk recovery |
|
||||
| Longhorn volume restore | `talos/talconfig.with-longhorn.yaml` — copy to `talconfig.yaml` after GPT wipe (see `talos/README.md` §5) |
|
||||
| Longhorn GPT wipe automation | `talos/scripts/longhorn-gpt-recovery.sh` |
|
||||
| kube-vip (kustomize) | `clusters/noble/bootstrap/kube-vip/` (`vip_interface` e.g. `ens18`) |
|
||||
| Cilium (Helm values) | `clusters/noble/bootstrap/cilium/` — `values.yaml` (phase 1), optional `values-kpr.yaml`, `README.md` |
|
||||
| CSI Volume Snapshot (CRDs + controller) | `clusters/noble/bootstrap/csi-snapshot-controller/` — `crd/`, `controller/` kustomize; **`ansible/roles/noble_csi_snapshot_controller`** |
|
||||
| MetalLB | `clusters/noble/bootstrap/metallb/` — `namespace.yaml` (PSA **privileged**), `ip-address-pool.yaml`, `kustomization.yaml`, `README.md` |
|
||||
| Longhorn | `clusters/noble/bootstrap/longhorn/` — `values.yaml`, `namespace.yaml` (PSA **privileged**), `kustomization.yaml` |
|
||||
| metrics-server (Helm values) | `clusters/noble/bootstrap/metrics-server/values.yaml` |
|
||||
| Traefik (Helm values) | `clusters/noble/bootstrap/traefik/` — `values.yaml`, `namespace.yaml`, `README.md` |
|
||||
| cert-manager (Helm + ClusterIssuers) | `clusters/noble/bootstrap/cert-manager/` — `values.yaml`, `namespace.yaml`, `kustomization.yaml`, `README.md` |
|
||||
| Newt / Pangolin tunnel (Helm) | `clusters/noble/bootstrap/newt/` — `values.yaml`, `namespace.yaml`, `README.md` |
|
||||
| Argo CD (Helm) + optional app-of-apps | `clusters/noble/bootstrap/argocd/` — `values.yaml`, `root-application.yaml`, `README.md`; optional **`Application`** tree in **`clusters/noble/apps/`** |
|
||||
| Argo CD (Helm) + app-of-apps | `clusters/noble/bootstrap/argocd/` — `values.yaml`, `root-application.yaml`, `bootstrap-root-application.yaml`, `app-of-apps/`, `README.md`; **`noble-root`** syncs **`clusters/noble/apps/`**; **`noble-bootstrap-root`** syncs **`clusters/noble/bootstrap`** (enable automation after **`noble.yml`**) |
|
||||
| kube-prometheus-stack (Helm values) | `clusters/noble/bootstrap/kube-prometheus-stack/` — `values.yaml`, `namespace.yaml` |
|
||||
| Grafana Loki datasource (ConfigMap; no chart change) | `clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml` |
|
||||
| Loki (Helm values) | `clusters/noble/bootstrap/loki/` — `values.yaml`, `namespace.yaml` |
|
||||
| Fluent Bit → Loki (Helm values) | `clusters/noble/bootstrap/fluent-bit/` — `values.yaml`, `namespace.yaml` |
|
||||
| Sealed Secrets (Helm) | `clusters/noble/bootstrap/sealed-secrets/` — `values.yaml`, `namespace.yaml`, `README.md` |
|
||||
| External Secrets Operator (Helm + Vault store example) | `clusters/noble/bootstrap/external-secrets/` — `values.yaml`, `namespace.yaml`, `README.md`, `examples/vault-cluster-secret-store.yaml` |
|
||||
| Vault (Helm + optional unseal CronJob) | `clusters/noble/bootstrap/vault/` — `values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `cilium-network-policy.yaml`, `configure-kubernetes-auth.sh`, `README.md` |
|
||||
| SOPS-encrypted cluster Secrets | `clusters/noble/secrets/` — `README.md`, `*.secret.yaml`; **`.sops.yaml`**, **`age-key.txt`** (gitignored) at repo root |
|
||||
| Kyverno + PSS baseline policies | `clusters/noble/bootstrap/kyverno/` — `values.yaml`, `policies-values.yaml`, `namespace.yaml`, `README.md` |
|
||||
| Headlamp (Helm + Ingress) | `clusters/noble/bootstrap/headlamp/` — `values.yaml`, `namespace.yaml`, `README.md` |
|
||||
| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (**Sealed Secrets** / **ESO** after **Phase E**) |
|
||||
| Velero (Helm + S3 BSL; CSI snapshots) | `clusters/noble/bootstrap/velero/` — `values.yaml`, `namespace.yaml`, `README.md`; **`ansible/roles/noble_velero`** |
|
||||
| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (SOPS under **`clusters/noble/secrets/`** or imperative **`kubectl create secret`**) |
|
||||
|
||||
**Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.
|
||||
|
||||
@@ -106,11 +104,12 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
||||
1. **Talos** installed; **Cilium** (or chosen CNI) **before** most workloads — with `cni: none`, nodes stay **NotReady** / **network-unavailable** taint until CNI is up.
|
||||
2. **MetalLB Helm chart** (CRDs + controller) **before** `kubectl apply -k` on the pool manifests.
|
||||
3. **`clusters/noble/bootstrap/metallb/namespace.yaml`** before or merged onto `metallb-system` so Pod Security does not block speaker (see `bootstrap/metallb/README.md`).
|
||||
4. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
|
||||
5. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
|
||||
6. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
|
||||
4. **CSI Volume snapshots:** **`kubernetes-csi/external-snapshotter`** CRDs + **`snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`) before relying on **Longhorn** / **Velero** volume snapshots.
|
||||
5. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
|
||||
6. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
|
||||
7. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
|
||||
8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
|
||||
8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, store the token with **SOPS** or an imperative Secret — no ingress required for the bot itself.
|
||||
9. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
|
||||
|
||||
## Prerequisites (before phases)
|
||||
|
||||
@@ -136,9 +135,10 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
||||
|
||||
## Phase B — Core platform
|
||||
|
||||
**Install order:** **Cilium** → **metrics-server** → **Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned.
|
||||
**Install order:** **Cilium** → **Volume Snapshot CRDs + snapshot-controller** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**) → **metrics-server** → **Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned.
|
||||
|
||||
- [x] **Cilium** (Helm **1.16.6**) — **required** before MetalLB if `cni: none` (`clusters/noble/bootstrap/cilium/`)
|
||||
- [x] **CSI Volume Snapshot** — CRDs + **`snapshot-controller`** in **`kube-system`** (`clusters/noble/bootstrap/csi-snapshot-controller/`); Ansible **`noble_csi_snapshot_controller`**; verify `kubectl api-resources | grep VolumeSnapshot`
|
||||
- [x] **metrics-server** — Helm **3.13.0**; values in `clusters/noble/bootstrap/metrics-server/values.yaml`; verify `kubectl top nodes`
|
||||
- [x] **Longhorn** — Talos: user volume + kubelet mounts + extensions (`talos/README.md` §5); Helm **1.11.1**; `kubectl apply -k clusters/noble/bootstrap/longhorn`; verify **`nodes.longhorn.io`** and test PVC **`Bound`**
|
||||
- [x] **MetalLB** — chart installed; **pool + L2** from `clusters/noble/bootstrap/metallb/` applied (`192.168.50.210`–`229`)
|
||||
@@ -152,7 +152,7 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
||||
- [x] **Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`) — also covered by **`ansible/playbooks/noble.yml`** (role **`noble_argocd`**)
|
||||
- [x] Argo CD server **LoadBalancer** — **`192.168.50.210`** (see `values.yaml`)
|
||||
- [x] **App-of-apps** — optional; **`clusters/noble/apps/kustomization.yaml`** is **empty** (core stack is **Ansible**-managed from **`clusters/noble/bootstrap/`**, not Argo). Set **`repoURL`** in **`root-application.yaml`** and add **`Application`** manifests only for optional GitOps workloads — see **`clusters/noble/apps/README.md`**
|
||||
- [x] **Renovate** — **`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **Sealed Secrets** / **ESO**. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed.
|
||||
- [x] **Renovate** — **`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **SOPS** or a one-off Secret. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed.
|
||||
- [ ] SSO — later
|
||||
|
||||
## Phase D — Observability
|
||||
@@ -163,19 +163,16 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
||||
|
||||
## Phase E — Secrets
|
||||
|
||||
- [x] **Sealed Secrets** (optional Git workflow) — `clusters/noble/bootstrap/sealed-secrets/` (Helm **2.18.4**); **`kubeseal`** + key backup per **`README.md`**
|
||||
- [x] **Vault** in-cluster on Longhorn + **auto-unseal** — `clusters/noble/bootstrap/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); **`configure-kubernetes-auth.sh`** for ESO (**Kubernetes auth** + KV + role)
|
||||
- [x] **External Secrets Operator** + Vault `ClusterSecretStore` — operator **`clusters/noble/bootstrap/external-secrets/`** (Helm **2.2.0**); apply **`examples/vault-cluster-secret-store.yaml`** after Vault (**`README.md`**)
|
||||
- [x] **SOPS** — encrypt **`Secret`** YAML under **`clusters/noble/secrets/`** with **age** (see **`.sops.yaml`**, **`clusters/noble/secrets/README.md`**); keep **`age-key.txt`** private (gitignored). **`ansible/playbooks/noble.yml`** decrypt-applies **`*.yaml`** when **`age-key.txt`** exists.
|
||||
|
||||
## Phase F — Policy + backups
|
||||
|
||||
- [x] **Kyverno** baseline policies — `clusters/noble/bootstrap/kyverno/` (Helm **kyverno** **3.7.1** + **kyverno-policies** **3.7.1**, **baseline** / **Audit** — see **`README.md`**)
|
||||
- [ ] **Velero** when S3 is ready; backup/restore drill
|
||||
- [ ] **Velero** — manifests + Ansible **`noble_velero`** (`clusters/noble/bootstrap/velero/`); enable with **`noble_velero_install: true`** + S3 bucket/URL + **`velero/velero-cloud-credentials`** (see **`velero/README.md`**); optional backup/restore drill
|
||||
|
||||
## Phase G — Hardening
|
||||
|
||||
- [x] **Cilium** — Vault **`CiliumNetworkPolicy`** (`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`) — HTTP **8200** from **`external-secrets`** + **`vault`**; extend for other clients as needed
|
||||
- [x] **Runbooks** — **`talos/runbooks/`** (API VIP / kube-vip, etcd–Talos, Longhorn, Vault)
|
||||
- [x] **Runbooks** — **`talos/runbooks/`** (API VIP / kube-vip, etcd–Talos, Longhorn, SOPS)
|
||||
- [x] **RBAC** — **Headlamp** **`ClusterRoleBinding`** uses built-in **`edit`** (not **`cluster-admin`**); **Argo CD** **`policy.default: role:readonly`** with **`g, admin, role:admin`** — see **`clusters/noble/bootstrap/headlamp/values.yaml`**, **`clusters/noble/bootstrap/argocd/values.yaml`**, **`talos/runbooks/rbac.md`**
|
||||
- [ ] **Alertmanager** — add **`slack_configs`**, **`pagerduty_configs`**, or other receivers under **`kube-prometheus-stack`** `alertmanager.config` (chart defaults use **`null`** receiver)
|
||||
|
||||
@@ -193,11 +190,10 @@ Lab stack is **up** on-cluster through **Phase D**–**F** and **Phase G** (Vaul
|
||||
- [x] **`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**)
|
||||
- [x] **Grafana** — **Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync)
|
||||
- [x] **Headlamp** — Deployment **Running** in **`headlamp`**; UI at **`https://headlamp.apps.noble.lab.pcenicni.dev`** (TLS via **`letsencrypt-prod`**)
|
||||
- [x] **`sealed-secrets`** — controller **Deployment** **Running** in **`sealed-secrets`** (install + **`kubeseal`** per **`apps/sealed-secrets/README.md`**)
|
||||
- [x] **`external-secrets`** — controller + webhook + cert-controller **Running** in **`external-secrets`**; apply **`ClusterSecretStore`** after Vault **Kubernetes auth**
|
||||
- [x] **`vault`** — **StatefulSet** **Running**, **`data-vault-0`** PVC **Bound** on **longhorn**; **`vault operator init`** + unseal per **`apps/vault/README.md`**
|
||||
- [x] **SOPS secrets** — **`clusters/noble/secrets/*.yaml`** encrypted in git; **`noble.yml`** applies decrypted manifests when **`age-key.txt`** is present
|
||||
- [x] **`kyverno`** — admission / background / cleanup / reports controllers **Running** in **`kyverno`**; **ClusterPolicies** for **PSS baseline** **Ready** (**Audit**)
|
||||
- [x] **Phase G (partial)** — Vault **`CiliumNetworkPolicy`**; **`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
|
||||
- [ ] **`velero`** — when enabled: Deployment **Running** in **`velero`**; **`BackupStorageLocation`** / **`VolumeSnapshotLocation`** **Available**; test backup per **`velero/README.md`**
|
||||
- [x] **Phase G (partial)** — **`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# Talos — noble lab
|
||||
|
||||
- **Cluster build checklist (exported TODO):** [CLUSTER-BUILD.md](./CLUSTER-BUILD.md)
|
||||
- **Operational runbooks (API VIP, etcd, Longhorn, Vault):** [runbooks/README.md](./runbooks/README.md)
|
||||
- **Operational runbooks (API VIP, etcd, Longhorn, SOPS):** [runbooks/README.md](./runbooks/README.md)
|
||||
|
||||
## Versions
|
||||
|
||||
|
||||
@@ -7,5 +7,5 @@ Short recovery / triage notes for the **noble** Talos cluster. Deep procedures l
|
||||
| Kubernetes API VIP (kube-vip) | [`api-vip-kube-vip.md`](./api-vip-kube-vip.md) |
|
||||
| etcd / Talos control plane | [`etcd-talos.md`](./etcd-talos.md) |
|
||||
| Longhorn storage | [`longhorn.md`](./longhorn.md) |
|
||||
| Vault (unseal, auth, ESO) | [`vault.md`](./vault.md) |
|
||||
| SOPS (secrets in git) | [`sops.md`](./sops.md) |
|
||||
| RBAC (Headlamp, Argo CD) | [`rbac.md`](./rbac.md) |
|
||||
|
||||
13
talos/runbooks/sops.md
Normal file
13
talos/runbooks/sops.md
Normal file
@@ -0,0 +1,13 @@
|
||||
# Runbook: SOPS secrets (git-encrypted)
|
||||
|
||||
**Symptoms:** `sops -d` fails; `kubectl apply` after Ansible shows no secret; `noble.yml` skips apply.
|
||||
|
||||
**Checklist**
|
||||
|
||||
1. **Private key:** `age-key.txt` at the repository root (gitignored). Create with `age-keygen -o age-key.txt` and add the **public** key to `.sops.yaml` (see `clusters/noble/secrets/README.md`).
|
||||
2. **Environment:** `export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt` when editing or applying by hand.
|
||||
3. **Edit encrypted file:** `sops clusters/noble/secrets/<name>.secret.yaml`
|
||||
4. **Apply one file:** `sops -d clusters/noble/secrets/<name>.secret.yaml | kubectl apply -f -`
|
||||
5. **Ansible:** `noble_apply_sops_secrets` is true by default; the platform role applies all `*.yaml` when `age-key.txt` exists.
|
||||
|
||||
**References:** [`clusters/noble/secrets/README.md`](../../clusters/noble/secrets/README.md), [Mozilla SOPS](https://github.com/getsops/sops).
|
||||
@@ -1,15 +0,0 @@
|
||||
# Runbook: Vault (in-cluster)
|
||||
|
||||
**Symptoms:** External Secrets **not syncing**, `ClusterSecretStore` **InvalidProviderConfig**, Vault UI/API **503 sealed**, pods **CrashLoop** on auth.
|
||||
|
||||
**Checks**
|
||||
|
||||
1. `kubectl -n vault exec -i sts/vault -- vault status` — **Sealed** / **Initialized**.
|
||||
2. Unseal key Secret + optional CronJob: [`clusters/noble/bootstrap/vault/README.md`](../../clusters/noble/bootstrap/vault/README.md), `unseal-cronjob.yaml`.
|
||||
3. Kubernetes auth for ESO: [`clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`](../../clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh) and `kubectl describe clustersecretstore vault`.
|
||||
4. **Cilium** policy: if Vault is unreachable from `external-secrets`, check [`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`](../../clusters/noble/bootstrap/vault/cilium-network-policy.yaml) and extend `ingress` for new client namespaces.
|
||||
|
||||
**Common fixes**
|
||||
|
||||
- Sealed: `vault operator unseal` or fix auto-unseal CronJob + `vault-unseal-key` Secret.
|
||||
- **403/invalid role** on ESO: re-run Kubernetes auth setup (issuer/CA/reviewer JWT) per README.
|
||||
Reference in New Issue
Block a user