Compare commits

...

27 Commits

Author SHA1 Message Date
Nikholas Pcenicni
aeffc7d6dd Remove Argo CD application configurations for Fluent Bit, Headlamp, Loki, kube-prometheus, and associated kustomization files from the noble bootstrap directory. This cleanup streamlines the project by eliminating unused resources and simplifies the deployment structure. 2026-04-01 02:14:49 -04:00
Nikholas Pcenicni
0f88a33216 Remove deprecated Argo CD application configurations for various components including cert-manager, Cilium, CSI snapshot controllers, kube-vip, and others. Update README.md to reflect the current state of leaf applications and clarify optional components. Adjust kustomization files to streamline resource management for bootstrap workloads. 2026-04-01 02:13:15 -04:00
Nikholas Pcenicni
bfb72cb519 Update Argo CD documentation and kustomization files to include additional applications and namespace resources. Enhance README.md with current leaf applications and clarify optional components. This improves deployment clarity and organization for bootstrap workloads. 2026-04-01 02:11:19 -04:00
Nikholas Pcenicni
51eb64dd9d Add applications to Argo CD kustomization.yaml for enhanced deployment 2026-04-01 02:05:10 -04:00
Nikholas Pcenicni
f259285f6e Enhance Argo CD integration by adding support for a bootstrap root application. Update group_vars/all.yml and role defaults to include noble_argocd_apply_bootstrap_root_application. Modify tasks to apply the bootstrap application conditionally. Revise documentation to clarify the GitOps workflow and the relationship between the core platform and optional applications. Remove outdated references and streamline the README for better user guidance. 2026-04-01 01:55:41 -04:00
Nikholas Pcenicni
c312ceeb56 Remove Eclipse Che application configurations and related documentation from the repository. This includes the deletion of application-checluster.yaml, application-devworkspace.yaml, application-operator.yaml, checluster.yaml, dwoc.yaml, kustomization.yaml, and README.md, streamlining the project by eliminating outdated resources. 2026-04-01 01:21:32 -04:00
Nikholas Pcenicni
c15bf4d708 Enhance Ansible playbooks and documentation for Debian and Proxmox management. Add new playbooks for Debian hardening, maintenance, SSH key rotation, and Proxmox cluster setup. Update README.md with quick start instructions for Debian and Proxmox operations. Modify group_vars to include Argo CD application settings, improving deployment flexibility and clarity. 2026-04-01 01:19:50 -04:00
Nikholas Pcenicni
89be30884e Update compose.yaml for Tracearr service to change the image tag from 'latest' to 'supervised' and remove unnecessary environment variables for DATABASE_URL and REDIS_URL. This streamlines the configuration and focuses on essential settings for deployment. 2026-03-30 22:53:47 -04:00
Nikholas Pcenicni
16948c62f9 Update compose.yaml for Tracearr service to include production environment variables and database configurations. This enhances deployment settings by specifying NODE_ENV, PORT, HOST, DATABASE_URL, REDIS_URL, JWT_SECRET, COOKIE_SECRET, and CORS_ORIGIN, improving overall service configuration and security. 2026-03-30 22:49:01 -04:00
Nikholas Pcenicni
3a6e5dff5b Update Ansible configuration to integrate SOPS for managing secrets. Enhance README.md with SOPS usage instructions and prerequisites. Remove External Secrets Operator references and related configurations from the bootstrap process, streamlining the deployment. Adjust playbooks and roles to apply SOPS-encrypted secrets automatically, improving security and clarity in secret management. 2026-03-30 22:42:52 -04:00
Nikholas Pcenicni
023ebfee5d Enhance Eclipse Che configuration in checluster.yaml by adding externalTLSConfig for secure workspace subdomains. This change ensures cert-manager can issue TLS certificates, preventing issues with unavailable servers when opening workspaces. 2026-03-29 02:03:57 -04:00
Nikholas Pcenicni
27fb4113eb Refactor DevWorkspaceOperatorConfig in dwoc.yaml to simplify configuration structure. This change removes the unnecessary spec.config nesting, aligning with the v1alpha1 API requirements and improving clarity for users configuring development workspaces. 2026-03-28 19:58:18 -04:00
Nikholas Pcenicni
4026591f0b Update README.md with troubleshooting steps for Eclipse Che and enhance kustomization.yaml to include DevWorkspaceOperatorConfig. This improves guidance for users facing deployment issues and ensures proper configuration for development workspace management. 2026-03-28 19:56:07 -04:00
Nikholas Pcenicni
8a740019ad Add Eclipse Che applications to kustomization.yaml for improved development workspace management. This update includes application-devworkspace, application-operator, and application-checluster resources, enhancing the deployment capabilities for the Noble cluster. 2026-03-28 19:53:01 -04:00
Nikholas Pcenicni
544f75b0ee Enhance documentation and configuration for Velero integration. Update README.md to clarify Velero's lack of web UI and usage instructions for CLI. Add CSI Volume Snapshot support in playbooks and roles, and include Velero service details in noble_landing_urls. Adjust kustomization.yaml to include VolumeSnapshotClass configuration, ensuring proper setup for backups. Improve overall clarity in related documentation. 2026-03-28 19:34:43 -04:00
Nikholas Pcenicni
33a10dc7e9 Add Velero configuration to .env.sample, README.md, and Ansible playbooks. Update group_vars to include noble_velero_install variable. Enhance documentation for optional Velero installation and S3 integration, improving clarity for backup and restore processes. 2026-03-28 18:39:22 -04:00
Nikholas Pcenicni
a4b9913b7e Update .env.sample and compose.yaml for Versity S3 Gateway to enhance WebUI and CORS configuration. Add comments clarifying the purpose of VGW_CORS_ALLOW_ORIGIN and correct usage of VGW_WEBUI_GATEWAYS, improving deployment instructions and user understanding. 2026-03-28 18:28:52 -04:00
Nikholas Pcenicni
11c62009a4 Update README.md, .env.sample, and compose.yaml for Versity S3 Gateway to clarify WebUI configuration. Enhance README with details on separate API and WebUI ports, and update .env.sample and compose.yaml to include WebUI settings for improved deployment instructions and usability. 2026-03-28 18:20:55 -04:00
Nikholas Pcenicni
03ed4e70a2 Enhance .env.sample and compose.yaml for Versity S3 Gateway by adding detailed comments on NFS metadata handling and sidecar mode. This improves documentation clarity for users configuring NFS mounts and metadata storage options. 2026-03-28 18:17:54 -04:00
Nikholas Pcenicni
7855b10982 Update compose.yaml to change volume paths for Versity S3 Gateway from named volumes to NFS mounts. This adjustment improves data persistence and accessibility by linking directly to the NFS directory structure. 2026-03-28 18:13:52 -04:00
Nikholas Pcenicni
079c11b20c Refactor Versity S3 Gateway configuration in README.md, .env.sample, and compose.yaml. Update README to clarify environment variable usage and adjust .env.sample for local setup instructions. Modify compose.yaml to utilize environment variable interpolation, ensuring proper credential handling and enhancing deployment security. 2026-03-28 17:56:24 -04:00
Nikholas Pcenicni
bf108a37e2 Update compose.yaml to include .env file for environment variable injection, enhancing security and usability for the Versity S3 Gateway deployment. This change ensures that necessary environment variables are accessible within the container, improving the overall configuration process. 2026-03-28 17:49:43 -04:00
Nikholas Pcenicni
97b56581ed Update README.md and .env.sample for Versity S3 Gateway configuration. Change path in README to reflect new directory structure and clarify environment variable usage for credentials. Modify .env.sample to include additional credential options and improve documentation for setting up the environment. Adjust compose.yaml to utilize pass-through environment variables, enhancing security and usability for deployment. 2026-03-28 17:46:08 -04:00
Nikholas Pcenicni
f154658d79 Add Versity S3 Gateway documentation to README.md, detailing configuration requirements and usage for shared object storage. This addition enhances clarity for users integrating S3-compatible APIs with POSIX directories. 2026-03-28 17:25:44 -04:00
Nikholas Pcenicni
90509bacc5 Update homepage values.yaml to replace external siteMonitor URLs with in-cluster service URLs for improved reliability. Enhance comments for clarity on service monitoring and Prometheus widget configurations. Adjust description for better accuracy regarding uptime checks and resource monitoring. 2026-03-28 17:13:57 -04:00
Nikholas Pcenicni
e4741ecd15 Enhance homepage values.yaml by adding support for RBAC, service account creation, and site monitoring for various services. Update widget configurations for Prometheus and introduce new widgets for datetime and Kubernetes resource monitoring. Adjust layout and styling settings for improved UI presentation. 2026-03-28 17:11:01 -04:00
Nikholas Pcenicni
f6647056be Add homepage entry to noble_landing_urls and update kustomization.yaml to include homepage resource 2026-03-28 17:07:06 -04:00
100 changed files with 2302 additions and 833 deletions

View File

@@ -11,3 +11,9 @@ CLOUDFLARE_DNS_API_TOKEN=
PANGOLIN_ENDPOINT=
NEWT_ID=
NEWT_SECRET=
# Velero — when **noble_velero_install=true**, set bucket + S3 API URL and credentials (see clusters/noble/bootstrap/velero/README.md).
NOBLE_VELERO_S3_BUCKET=
NOBLE_VELERO_S3_URL=
NOBLE_VELERO_AWS_ACCESS_KEY_ID=
NOBLE_VELERO_AWS_SECRET_ACCESS_KEY=

7
.sops.yaml Normal file
View File

@@ -0,0 +1,7 @@
# Mozilla SOPS — encrypt/decrypt Kubernetes Secret manifests under clusters/noble/secrets/
# Generate a key: age-keygen -o age-key.txt (age-key.txt is gitignored)
# Add the printed public key below (one recipient per line is supported).
creation_rules:
- path_regex: clusters/noble/secrets/.*\.yaml$
age: >-
age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn

View File

@@ -180,6 +180,12 @@ Shared services used across multiple applications.
**Configuration:** Requires Pangolin endpoint URL, Newt ID, and Newt secret.
### versitygw/ (`komodo/s3/versitygw/`)
- **[Versity S3 Gateway](https://github.com/versity/versitygw)** — S3 API on port **10000** by default; optional **WebUI** on **8080** (not the same listener—enable `VERSITYGW_WEBUI_PORT` / `VGW_WEBUI_GATEWAYS` per `.env.sample`). Behind **Pangolin**, expose the API and WebUI separately (or you will see **404** browsing the API URL).
**Configuration:** Set either `ROOT_ACCESS_KEY` / `ROOT_SECRET_KEY` or `ROOT_ACCESS_KEY_ID` / `ROOT_SECRET_ACCESS_KEY`. Optional `VERSITYGW_PORT`. Compose uses `${VAR}` interpolation so credentials work with Komodos `docker compose --env-file <run_directory>/.env` (avoid `env_file:` in the service when `run_directory` is not the same folder as `compose.yaml`, or the written `.env` will not be found).
---
## 📊 Monitoring (`komodo/monitor/`)

View File

@@ -24,6 +24,7 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
## Prerequisites
- `talosctl` (matches node Talos version), `talhelper`, `helm`, `kubectl`.
- **SOPS secrets:** `sops` and `age` on the control host if you use **`clusters/noble/secrets/`** with **`age-key.txt`** (see **`clusters/noble/secrets/README.md`**).
- **Phase A:** same LAN/VPN as nodes so **Talos :50000** and **Kubernetes :6443** are reachable (see [`talos/README.md`](../talos/README.md) §3).
- **noble.yml:** bootstrapped cluster and **`talos/kubeconfig`** (or `KUBECONFIG`).
@@ -34,8 +35,16 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
| [`playbooks/deploy.yml`](playbooks/deploy.yml) | **Talos Phase A** then **`noble.yml`** (full automation). |
| [`playbooks/talos_phase_a.yml`](playbooks/talos_phase_a.yml) | `genconfig``apply-config``bootstrap``kubeconfig` only. |
| [`playbooks/noble.yml`](playbooks/noble.yml) | Helm + `kubectl` platform (after Phase A). |
| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | Vault / ESO reminders (`noble_apply_vault_cluster_secret_store`). |
| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | SOPS reminders and optional Argo root Application note. |
| [`playbooks/talos_bootstrap.yml`](playbooks/talos_bootstrap.yml) | **`talhelper genconfig` only** (legacy shortcut; prefer **`talos_phase_a.yml`**). |
| [`playbooks/debian_harden.yml`](playbooks/debian_harden.yml) | Baseline hardening for Debian servers (SSH/sysctl/fail2ban/unattended-upgrades). |
| [`playbooks/debian_maintenance.yml`](playbooks/debian_maintenance.yml) | Debian maintenance run (apt upgrades, autoremove/autoclean, reboot when required). |
| [`playbooks/debian_rotate_ssh_keys.yml`](playbooks/debian_rotate_ssh_keys.yml) | Rotate managed users' `authorized_keys`. |
| [`playbooks/debian_ops.yml`](playbooks/debian_ops.yml) | Convenience pipeline: harden then maintenance for Debian servers. |
| [`playbooks/proxmox_prepare.yml`](playbooks/proxmox_prepare.yml) | Configure Proxmox community repos and disable no-subscription UI warning. |
| [`playbooks/proxmox_upgrade.yml`](playbooks/proxmox_upgrade.yml) | Proxmox maintenance run (apt dist-upgrade, cleanup, reboot when required). |
| [`playbooks/proxmox_cluster.yml`](playbooks/proxmox_cluster.yml) | Create a Proxmox cluster on the master and join additional hosts. |
| [`playbooks/proxmox_ops.yml`](playbooks/proxmox_ops.yml) | Convenience pipeline: prepare, upgrade, then cluster Proxmox hosts. |
```bash
cd ansible
@@ -65,11 +74,13 @@ Override with `-e` when needed, e.g. **`-e noble_talos_skip_bootstrap=true`** if
```bash
ansible-playbook playbooks/noble.yml --tags cilium,metallb
ansible-playbook playbooks/noble.yml --skip-tags newt
ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true -e noble_velero_s3_bucket=... -e noble_velero_s3_url=...
```
### Variables — `group_vars/all.yml`
### Variables — `group_vars/all.yml` and role defaults
- **`noble_newt_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_apply_vault_cluster_secret_store`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**.
- **`group_vars/all.yml`:** **`noble_newt_install`**, **`noble_velero_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_argocd_apply_root_application`**, **`noble_argocd_apply_bootstrap_root_application`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**
- **`roles/noble_platform/defaults/main.yml`:** **`noble_apply_sops_secrets`**, **`noble_sops_age_key_file`** (SOPS secrets under **`clusters/noble/secrets/`**)
## Roles
@@ -77,10 +88,67 @@ ansible-playbook playbooks/noble.yml --skip-tags newt
|------|----------|
| `talos_phase_a` | Talos genconfig, apply-config, bootstrap, kubeconfig |
| `helm_repos` | `helm repo add` / `update` |
| `noble_*` | Cilium, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack |
| `noble_*` | Cilium, CSI Volume Snapshot CRDs + controller, metrics-server, Longhorn, MetalLB (20m Helm wait), kube-vip, Traefik, cert-manager, Newt, Argo CD, Kyverno, platform stack, Velero (optional) |
| `noble_landing_urls` | Writes **`ansible/output/noble-lab-ui-urls.md`** — URLs, service names, and (optional) Argo/Grafana passwords from Secrets |
| `noble_post_deploy` | Post-install reminders |
| `talos_bootstrap` | Genconfig-only (used by older playbook) |
| `debian_baseline_hardening` | Baseline Debian hardening (SSH policy, sysctl profile, fail2ban, unattended upgrades) |
| `debian_maintenance` | Routine Debian maintenance tasks (updates, cleanup, reboot-on-required) |
| `debian_ssh_key_rotation` | Declarative `authorized_keys` rotation for server users |
| `proxmox_baseline` | Proxmox repo prep (community repos) and no-subscription warning suppression |
| `proxmox_maintenance` | Proxmox package maintenance (dist-upgrade, cleanup, reboot-on-required) |
| `proxmox_cluster` | Proxmox cluster bootstrap/join automation using `pvecm` |
## Debian server ops quick start
These playbooks are separate from the Talos/noble flow and target hosts in `debian_servers`.
1. Copy `inventory/debian.example.yml` to `inventory/debian.yml` and update hosts/users.
2. Update `group_vars/debian_servers.yml` with your allowed SSH users and real public keys.
3. Run with the Debian inventory:
```bash
cd ansible
ansible-playbook -i inventory/debian.yml playbooks/debian_harden.yml
ansible-playbook -i inventory/debian.yml playbooks/debian_rotate_ssh_keys.yml
ansible-playbook -i inventory/debian.yml playbooks/debian_maintenance.yml
```
Or run the combined maintenance pipeline:
```bash
cd ansible
ansible-playbook -i inventory/debian.yml playbooks/debian_ops.yml
```
## Proxmox host + cluster quick start
These playbooks are separate from the Talos/noble flow and target hosts in `proxmox_hosts`.
1. Copy `inventory/proxmox.example.yml` to `inventory/proxmox.yml` and update hosts/users.
2. Update `group_vars/proxmox_hosts.yml` with your cluster name (`proxmox_cluster_name`), chosen cluster master, and root public key file paths to install.
3. First run (no SSH keys yet): use `--ask-pass` **or** set `ansible_password` (prefer Ansible Vault). Keep `ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"` in inventory for first-contact hosts.
4. Run prepare first to install your public keys on each host, then continue:
```bash
cd ansible
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_prepare.yml --ask-pass
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_upgrade.yml
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_cluster.yml
```
After `proxmox_prepare.yml` finishes, SSH key auth should work for root (keys from `proxmox_root_authorized_key_files`), so `--ask-pass` is usually no longer needed.
If `pvecm add` still prompts for the master root password during join, set `proxmox_cluster_master_root_password` (prefer Vault) to run join non-interactively.
Changing `proxmox_cluster_name` only affects new cluster creation; it does not rename an already-created cluster.
Or run the full Proxmox pipeline:
```bash
cd ansible
ansible-playbook -i inventory/proxmox.yml playbooks/proxmox_ops.yml
```
## Migrating from Argo-managed `noble-platform`

View File

@@ -13,11 +13,16 @@ noble_k8s_api_server_fallback: "https://192.168.50.20:6443"
# Only if you must skip the kubectl /healthz preflight (not recommended).
noble_skip_k8s_health_check: false
# Pangolin / Newt — set true only after creating newt-pangolin-auth Secret (see clusters/noble/bootstrap/newt/README.md)
# Pangolin / Newt — set true only after newt-pangolin-auth Secret exists (SOPS: clusters/noble/secrets/ or imperative — see clusters/noble/bootstrap/newt/README.md)
noble_newt_install: false
# cert-manager needs Secret cloudflare-dns-api-token in cert-manager namespace before ClusterIssuers work
noble_cert_manager_require_cloudflare_secret: true
# post_deploy.yml — apply Vault ClusterSecretStore only after Vault is initialized and K8s auth is configured
noble_apply_vault_cluster_secret_store: false
# Velero — set **noble_velero_install: true** plus S3 bucket/URL (and credentials — see clusters/noble/bootstrap/velero/README.md)
noble_velero_install: false
# Argo CD — apply app-of-apps root Application (clusters/noble/bootstrap/argocd/root-application.yaml). Set false to skip.
noble_argocd_apply_root_application: true
# Bootstrap kustomize in Argo (**noble-bootstrap-root** → **clusters/noble/bootstrap**). Applied with manual sync; enable automation after **noble.yml** (see **clusters/noble/bootstrap/argocd/README.md** §5).
noble_argocd_apply_bootstrap_root_application: true

View File

@@ -0,0 +1,12 @@
---
# Hardened SSH settings
debian_baseline_ssh_allow_users:
- admin
# Example key rotation entries. Replace with your real users and keys.
debian_ssh_rotation_users:
- name: admin
home: /home/admin
state: present
keys:
- "ssh-ed25519 AAAAEXAMPLE_REPLACE_ME admin@workstation"

View File

@@ -0,0 +1,37 @@
---
# Proxmox repositories
proxmox_repo_debian_codename: trixie
proxmox_repo_disable_enterprise: true
proxmox_repo_disable_ceph_enterprise: true
proxmox_repo_enable_pve_no_subscription: true
proxmox_repo_enable_ceph_no_subscription: true
# Suppress "No valid subscription" warning in UI
proxmox_no_subscription_notice_disable: true
# Public keys to install for root on each Proxmox host.
proxmox_root_authorized_key_files:
- "{{ lookup('env', 'HOME') }}/.ssh/id_ed25519.pub"
- "{{ lookup('env', 'HOME') }}/.ssh/ansible.pub"
# Package upgrade/reboot policy
proxmox_upgrade_apt_cache_valid_time: 3600
proxmox_upgrade_autoremove: true
proxmox_upgrade_autoclean: true
proxmox_upgrade_reboot_if_required: true
proxmox_upgrade_reboot_timeout: 1800
# Cluster settings
proxmox_cluster_enabled: true
proxmox_cluster_name: atomic-hub
# Bootstrap host name from inventory (first host by default if empty)
proxmox_cluster_master: ""
# Optional explicit IP/FQDN for joining; leave empty to use ansible_host of master
proxmox_cluster_master_ip: ""
proxmox_cluster_force: false
# Optional: use only for first cluster joins when inter-node SSH trust is not established.
# Prefer storing with Ansible Vault if you set this.
proxmox_cluster_master_root_password: "Hemroid8"

View File

@@ -0,0 +1,11 @@
---
all:
children:
debian_servers:
hosts:
debian-01:
ansible_host: 192.168.50.101
ansible_user: admin
debian-02:
ansible_host: 192.168.50.102
ansible_user: admin

View File

@@ -0,0 +1,24 @@
---
all:
children:
proxmox_hosts:
vars:
ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"
hosts:
helium:
ansible_host: 192.168.1.100
ansible_user: root
# First run without SSH keys:
# ansible_password: "{{ vault_proxmox_root_password }}"
neon:
ansible_host: 192.168.1.90
ansible_user: root
# ansible_password: "{{ vault_proxmox_root_password }}"
argon:
ansible_host: 192.168.1.80
ansible_user: root
# ansible_password: "{{ vault_proxmox_root_password }}"
krypton:
ansible_host: 192.168.1.70
ansible_user: root
# ansible_password: "{{ vault_proxmox_root_password }}"

View File

@@ -0,0 +1,24 @@
---
all:
children:
proxmox_hosts:
vars:
ansible_ssh_common_args: "-o StrictHostKeyChecking=accept-new"
hosts:
helium:
ansible_host: 192.168.1.100
ansible_user: root
# First run without SSH keys:
# ansible_password: "{{ vault_proxmox_root_password }}"
neon:
ansible_host: 192.168.1.90
ansible_user: root
# ansible_password: "{{ vault_proxmox_root_password }}"
argon:
ansible_host: 192.168.1.80
ansible_user: root
# ansible_password: "{{ vault_proxmox_root_password }}"
krypton:
ansible_host: 192.168.1.70
ansible_user: root
# ansible_password: "{{ vault_proxmox_root_password }}"

View File

@@ -0,0 +1,8 @@
---
- name: Debian server baseline hardening
hosts: debian_servers
become: true
gather_facts: true
roles:
- role: debian_baseline_hardening
tags: [hardening, baseline]

View File

@@ -0,0 +1,8 @@
---
- name: Debian maintenance (updates + reboot handling)
hosts: debian_servers
become: true
gather_facts: true
roles:
- role: debian_maintenance
tags: [maintenance, updates]

View File

@@ -0,0 +1,3 @@
---
- import_playbook: debian_harden.yml
- import_playbook: debian_maintenance.yml

View File

@@ -0,0 +1,8 @@
---
- name: Debian SSH key rotation
hosts: debian_servers
become: true
gather_facts: false
roles:
- role: debian_ssh_key_rotation
tags: [ssh, ssh_keys, rotation]

View File

@@ -3,8 +3,8 @@
# Do not run until `kubectl get --raw /healthz` returns ok (see talos/README.md §3, CLUSTER-BUILD Phase A).
# Run from repo **ansible/** directory: ansible-playbook playbooks/noble.yml
#
# Tags: repos, cilium, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt,
# argocd, kyverno, kyverno_policies, platform, all (default)
# Tags: repos, cilium, csi_snapshot, metrics, longhorn, metallb, kube_vip, traefik, cert_manager, newt,
# argocd, kyverno, kyverno_policies, platform, velero, all (default)
- name: Noble cluster — platform stack (Ansible-managed)
hosts: localhost
connection: local
@@ -113,6 +113,7 @@
tags: [always]
# talosctl kubeconfig often sets server to the VIP; off-LAN you can reach a control-plane IP but not 192.168.50.230.
# kubectl stderr is often "The connection to the server ... was refused" (no substring "connection refused").
- name: Auto-fallback API server when VIP is unreachable (temp kubeconfig)
tags: [always]
when:
@@ -120,8 +121,7 @@
- noble_k8s_api_server_override | default('') | length == 0
- not (noble_skip_k8s_health_check | default(false) | bool)
- (noble_k8s_health_first.rc | default(1)) != 0 or (noble_k8s_health_first.stdout | default('') | trim) != 'ok'
- ('network is unreachable' in (noble_k8s_health_first.stderr | default('') | lower)) or
('no route to host' in (noble_k8s_health_first.stderr | default('') | lower))
- (((noble_k8s_health_first.stderr | default('')) ~ (noble_k8s_health_first.stdout | default(''))) | lower) is search('network is unreachable|no route to host|connection refused|was refused', multiline=False)
block:
- name: Ensure temp dir for kubeconfig auto-fallback
ansible.builtin.file:
@@ -202,6 +202,8 @@
tags: [repos, helm]
- role: noble_cilium
tags: [cilium, cni]
- role: noble_csi_snapshot_controller
tags: [csi_snapshot, snapshot, storage]
- role: noble_metrics_server
tags: [metrics, metrics_server]
- role: noble_longhorn
@@ -224,5 +226,7 @@
tags: [kyverno_policies, policy]
- role: noble_platform
tags: [platform, observability, apps]
- role: noble_velero
tags: [velero, backups]
- role: noble_landing_urls
tags: [landing, platform, observability, apps]

View File

@@ -1,12 +1,7 @@
---
# Manual follow-ups after **noble.yml**: Vault init/unseal, Kubernetes auth for Vault, ESO ClusterSecretStore.
# Run: ansible-playbook playbooks/post_deploy.yml
- name: Noble cluster — post-install reminders
hosts: localhost
# Manual follow-ups after **noble.yml**: SOPS key backup, optional Argo root Application.
- hosts: localhost
connection: local
gather_facts: false
vars:
noble_repo_root: "{{ playbook_dir | dirname | dirname }}"
noble_kubeconfig: "{{ lookup('env', 'KUBECONFIG') | default(noble_repo_root + '/talos/kubeconfig', true) }}"
roles:
- role: noble_post_deploy
- noble_post_deploy

View File

@@ -0,0 +1,9 @@
---
- name: Proxmox cluster bootstrap/join
hosts: proxmox_hosts
become: true
gather_facts: false
serial: 1
roles:
- role: proxmox_cluster
tags: [proxmox, cluster]

View File

@@ -0,0 +1,4 @@
---
- import_playbook: proxmox_prepare.yml
- import_playbook: proxmox_upgrade.yml
- import_playbook: proxmox_cluster.yml

View File

@@ -0,0 +1,8 @@
---
- name: Proxmox host preparation (community repos + no-subscription notice)
hosts: proxmox_hosts
become: true
gather_facts: true
roles:
- role: proxmox_baseline
tags: [proxmox, prepare, repos, ui]

View File

@@ -0,0 +1,9 @@
---
- name: Proxmox host maintenance (upgrade to latest)
hosts: proxmox_hosts
become: true
gather_facts: true
serial: 1
roles:
- role: proxmox_maintenance
tags: [proxmox, maintenance, updates]

View File

@@ -0,0 +1,39 @@
---
# Update apt metadata only when stale (seconds)
debian_baseline_apt_cache_valid_time: 3600
# Core host hardening packages
debian_baseline_packages:
- unattended-upgrades
- apt-listchanges
- fail2ban
- needrestart
- sudo
- ca-certificates
# SSH hardening controls
debian_baseline_ssh_permit_root_login: "no"
debian_baseline_ssh_password_authentication: "no"
debian_baseline_ssh_pubkey_authentication: "yes"
debian_baseline_ssh_x11_forwarding: "no"
debian_baseline_ssh_max_auth_tries: 3
debian_baseline_ssh_client_alive_interval: 300
debian_baseline_ssh_client_alive_count_max: 2
debian_baseline_ssh_allow_users: []
# unattended-upgrades controls
debian_baseline_enable_unattended_upgrades: true
debian_baseline_unattended_auto_upgrade: "1"
debian_baseline_unattended_update_lists: "1"
# Kernel and network hardening sysctls
debian_baseline_sysctl_settings:
net.ipv4.conf.all.accept_redirects: "0"
net.ipv4.conf.default.accept_redirects: "0"
net.ipv4.conf.all.send_redirects: "0"
net.ipv4.conf.default.send_redirects: "0"
net.ipv4.conf.all.log_martians: "1"
net.ipv4.conf.default.log_martians: "1"
net.ipv4.tcp_syncookies: "1"
net.ipv6.conf.all.accept_redirects: "0"
net.ipv6.conf.default.accept_redirects: "0"

View File

@@ -0,0 +1,12 @@
---
- name: Restart ssh
ansible.builtin.service:
name: ssh
state: restarted
- name: Reload sysctl
ansible.builtin.command:
argv:
- sysctl
- --system
changed_when: true

View File

@@ -0,0 +1,52 @@
---
- name: Refresh apt cache
ansible.builtin.apt:
update_cache: true
cache_valid_time: "{{ debian_baseline_apt_cache_valid_time }}"
- name: Install baseline hardening packages
ansible.builtin.apt:
name: "{{ debian_baseline_packages }}"
state: present
- name: Configure unattended-upgrades auto settings
ansible.builtin.copy:
dest: /etc/apt/apt.conf.d/20auto-upgrades
mode: "0644"
content: |
APT::Periodic::Update-Package-Lists "{{ debian_baseline_unattended_update_lists }}";
APT::Periodic::Unattended-Upgrade "{{ debian_baseline_unattended_auto_upgrade }}";
when: debian_baseline_enable_unattended_upgrades | bool
- name: Configure SSH hardening options
ansible.builtin.copy:
dest: /etc/ssh/sshd_config.d/99-hardening.conf
mode: "0644"
content: |
PermitRootLogin {{ debian_baseline_ssh_permit_root_login }}
PasswordAuthentication {{ debian_baseline_ssh_password_authentication }}
PubkeyAuthentication {{ debian_baseline_ssh_pubkey_authentication }}
X11Forwarding {{ debian_baseline_ssh_x11_forwarding }}
MaxAuthTries {{ debian_baseline_ssh_max_auth_tries }}
ClientAliveInterval {{ debian_baseline_ssh_client_alive_interval }}
ClientAliveCountMax {{ debian_baseline_ssh_client_alive_count_max }}
{% if debian_baseline_ssh_allow_users | length > 0 %}
AllowUsers {{ debian_baseline_ssh_allow_users | join(' ') }}
{% endif %}
notify: Restart ssh
- name: Configure baseline sysctls
ansible.builtin.copy:
dest: /etc/sysctl.d/99-hardening.conf
mode: "0644"
content: |
{% for key, value in debian_baseline_sysctl_settings.items() %}
{{ key }} = {{ value }}
{% endfor %}
notify: Reload sysctl
- name: Ensure fail2ban service is enabled
ansible.builtin.service:
name: fail2ban
enabled: true
state: started

View File

@@ -0,0 +1,7 @@
---
debian_maintenance_apt_cache_valid_time: 3600
debian_maintenance_upgrade_type: dist
debian_maintenance_autoremove: true
debian_maintenance_autoclean: true
debian_maintenance_reboot_if_required: true
debian_maintenance_reboot_timeout: 1800

View File

@@ -0,0 +1,30 @@
---
- name: Refresh apt cache
ansible.builtin.apt:
update_cache: true
cache_valid_time: "{{ debian_maintenance_apt_cache_valid_time }}"
- name: Upgrade Debian packages
ansible.builtin.apt:
upgrade: "{{ debian_maintenance_upgrade_type }}"
- name: Remove orphaned packages
ansible.builtin.apt:
autoremove: "{{ debian_maintenance_autoremove }}"
- name: Clean apt package cache
ansible.builtin.apt:
autoclean: "{{ debian_maintenance_autoclean }}"
- name: Check if reboot is required
ansible.builtin.stat:
path: /var/run/reboot-required
register: debian_maintenance_reboot_required_file
- name: Reboot when required by package updates
ansible.builtin.reboot:
reboot_timeout: "{{ debian_maintenance_reboot_timeout }}"
msg: "Reboot initiated by Ansible maintenance playbook"
when:
- debian_maintenance_reboot_if_required | bool
- debian_maintenance_reboot_required_file.stat.exists | default(false)

View File

@@ -0,0 +1,10 @@
---
# List of users to manage keys for.
# Example:
# debian_ssh_rotation_users:
# - name: deploy
# home: /home/deploy
# state: present
# keys:
# - "ssh-ed25519 AAAA... deploy@laptop"
debian_ssh_rotation_users: []

View File

@@ -0,0 +1,50 @@
---
- name: Validate SSH key rotation inputs
ansible.builtin.assert:
that:
- item.name is defined
- item.home is defined
- (item.state | default('present')) in ['present', 'absent']
- (item.state | default('present')) == 'absent' or (item.keys is defined and item.keys | length > 0)
fail_msg: >-
Each entry in debian_ssh_rotation_users must include name, home, and either:
state=absent, or keys with at least one SSH public key.
loop: "{{ debian_ssh_rotation_users }}"
loop_control:
label: "{{ item.name | default('unknown') }}"
- name: Ensure ~/.ssh exists for managed users
ansible.builtin.file:
path: "{{ item.home }}/.ssh"
state: directory
owner: "{{ item.name }}"
group: "{{ item.name }}"
mode: "0700"
loop: "{{ debian_ssh_rotation_users }}"
loop_control:
label: "{{ item.name }}"
when: (item.state | default('present')) == 'present'
- name: Rotate authorized_keys for managed users
ansible.builtin.copy:
dest: "{{ item.home }}/.ssh/authorized_keys"
owner: "{{ item.name }}"
group: "{{ item.name }}"
mode: "0600"
content: |
{% for key in item.keys %}
{{ key }}
{% endfor %}
loop: "{{ debian_ssh_rotation_users }}"
loop_control:
label: "{{ item.name }}"
when: (item.state | default('present')) == 'present'
- name: Remove authorized_keys for users marked absent
ansible.builtin.file:
path: "{{ item.home }}/.ssh/authorized_keys"
state: absent
loop: "{{ debian_ssh_rotation_users }}"
loop_control:
label: "{{ item.name }}"
when: (item.state | default('present')) == 'absent'

View File

@@ -8,11 +8,9 @@ noble_helm_repos:
- { name: fossorial, url: "https://charts.fossorial.io" }
- { name: argo, url: "https://argoproj.github.io/argo-helm" }
- { name: metrics-server, url: "https://kubernetes-sigs.github.io/metrics-server/" }
- { name: sealed-secrets, url: "https://bitnami-labs.github.io/sealed-secrets" }
- { name: external-secrets, url: "https://charts.external-secrets.io" }
- { name: hashicorp, url: "https://helm.releases.hashicorp.com" }
- { name: prometheus-community, url: "https://prometheus-community.github.io/helm-charts" }
- { name: grafana, url: "https://grafana.github.io/helm-charts" }
- { name: fluent, url: "https://fluent.github.io/helm-charts" }
- { name: headlamp, url: "https://kubernetes-sigs.github.io/headlamp/" }
- { name: kyverno, url: "https://kyverno.github.io/kyverno/" }
- { name: vmware-tanzu, url: "https://vmware-tanzu.github.io/helm-charts" }

View File

@@ -0,0 +1,6 @@
---
# When true, applies clusters/noble/bootstrap/argocd/root-application.yaml (app-of-apps).
# Edit spec.source.repoURL in that file if your Git remote differs.
noble_argocd_apply_root_application: false
# When true, applies clusters/noble/bootstrap/argocd/bootstrap-root-application.yaml (noble-bootstrap-root; manual sync until README §5).
noble_argocd_apply_bootstrap_root_application: true

View File

@@ -15,6 +15,32 @@
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/argocd/values.yaml"
- --wait
- --timeout
- 15m
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
changed_when: true
- name: Apply Argo CD root Application (app-of-apps)
ansible.builtin.command:
argv:
- kubectl
- apply
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/argocd/root-application.yaml"
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
when: noble_argocd_apply_root_application | default(false) | bool
changed_when: true
- name: Apply Argo CD bootstrap app-of-apps Application
ansible.builtin.command:
argv:
- kubectl
- apply
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/argocd/bootstrap-root-application.yaml"
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
when: noble_argocd_apply_bootstrap_root_application | default(false) | bool
changed_when: true

View File

@@ -0,0 +1,2 @@
---
noble_csi_snapshot_kubectl_timeout: 120s

View File

@@ -0,0 +1,39 @@
---
# Volume Snapshot CRDs + snapshot-controller (Velero CSI / Longhorn snapshots).
- name: Apply Volume Snapshot CRDs (snapshot.storage.k8s.io)
ansible.builtin.command:
argv:
- kubectl
- apply
- "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}"
- -k
- "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/crd"
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
changed_when: true
- name: Apply snapshot-controller in kube-system
ansible.builtin.command:
argv:
- kubectl
- apply
- "--request-timeout={{ noble_csi_snapshot_kubectl_timeout | default('120s') }}"
- -k
- "{{ noble_repo_root }}/clusters/noble/bootstrap/csi-snapshot-controller/controller"
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
changed_when: true
- name: Wait for snapshot-controller Deployment
ansible.builtin.command:
argv:
- kubectl
- -n
- kube-system
- rollout
- status
- deploy/snapshot-controller
- --timeout=120s
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
changed_when: false

View File

@@ -39,8 +39,13 @@ noble_lab_ui_entries:
namespace: longhorn-system
service: longhorn-frontend
url: https://longhorn.apps.noble.lab.pcenicni.dev
- name: Vault
description: Secrets engine UI (after init/unseal)
namespace: vault
service: vault
url: https://vault.apps.noble.lab.pcenicni.dev
- name: Velero
description: Cluster backups — no web UI (velero CLI / kubectl CRDs)
namespace: velero
service: velero
url: ""
- name: Homepage
description: App dashboard (links to lab UIs)
namespace: homepage
service: homepage
url: https://homepage.apps.noble.lab.pcenicni.dev

View File

@@ -11,7 +11,7 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t
| UI | What | Kubernetes service | Namespace | URL |
|----|------|----------------------|-----------|-----|
{% for e in noble_lab_ui_entries %}
| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | [{{ e.url }}]({{ e.url }}) |
| {{ e.name }} | {{ e.description }} | `{{ e.service }}` | `{{ e.namespace }}` | {% if e.url | default('') | length > 0 %}[{{ e.url }}]({{ e.url }}){% else %}—{% endif %} |
{% endfor %}
## Initial access (logins)
@@ -24,7 +24,6 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t
| **Prometheus** | — | No auth in default install (lab). |
| **Alertmanager** | — | No auth in default install (lab). |
| **Longhorn** | — | No default login unless you enable access control in the UI settings. |
| **Vault** | Token | Root token is only from **`vault operator init`** (not stored in git). See `clusters/noble/bootstrap/vault/README.md`. |
### Commands to retrieve passwords (if not filled above)
@@ -46,6 +45,7 @@ To generate this file **without** calling kubectl, run Ansible with **`-e noble_
- **Argo CD** `argocd-initial-admin-secret` disappears after you change the admin password.
- **Grafana** password is random unless you set `grafana.adminPassword` in chart values.
- **Vault** UI needs **unsealed** Vault; tokens come from your chosen auth method.
- **Prometheus / Alertmanager** UIs are unauthenticated by default — restrict when hardening (`talos/CLUSTER-BUILD.md` Phase G).
- **SOPS:** cluster secrets in git under **`clusters/noble/secrets/`** are encrypted; decrypt with **`age-key.txt`** (not in git). See **`clusters/noble/secrets/README.md`**.
- **Headlamp** token above expires after the configured duration; re-run Ansible or `kubectl create token` to refresh.
- **Velero** has **no web UI** — use **`velero`** CLI or **`kubectl -n velero get backup,schedule,backupstoragelocation`**. Metrics: **`velero`** Service in **`velero`** (Prometheus scrape). See `clusters/noble/bootstrap/velero/README.md`.

View File

@@ -4,5 +4,6 @@ noble_platform_kubectl_request_timeout: 120s
noble_platform_kustomize_retries: 5
noble_platform_kustomize_delay: 20
# Vault: injector (vault-k8s) owns MutatingWebhookConfiguration.caBundle; Helm upgrade can SSA-conflict. Delete webhook so Helm can recreate it.
noble_vault_delete_injector_webhook_before_helm: true
# Decrypt **clusters/noble/secrets/*.yaml** with SOPS and kubectl apply (requires **sops**, **age**, and **age-key.txt**).
noble_apply_sops_secrets: true
noble_sops_age_key_file: "{{ noble_repo_root }}/age-key.txt"

View File

@@ -1,6 +1,6 @@
---
# Mirrors former **noble-platform** Argo Application: Helm releases + plain manifests under clusters/noble/bootstrap.
- name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource, Vault extras)
- name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource)
ansible.builtin.command:
argv:
- kubectl
@@ -16,77 +16,26 @@
until: noble_platform_kustomize.rc == 0
changed_when: true
- name: Install Sealed Secrets
ansible.builtin.command:
argv:
- helm
- upgrade
- --install
- sealed-secrets
- sealed-secrets/sealed-secrets
- --namespace
- sealed-secrets
- --version
- "2.18.4"
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/sealed-secrets/values.yaml"
- --wait
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
changed_when: true
- name: Stat SOPS age private key (age-key.txt)
ansible.builtin.stat:
path: "{{ noble_sops_age_key_file }}"
register: noble_sops_age_key_stat
- name: Install External Secrets Operator
ansible.builtin.command:
argv:
- helm
- upgrade
- --install
- external-secrets
- external-secrets/external-secrets
- --namespace
- external-secrets
- --version
- "2.2.0"
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/values.yaml"
- --wait
- name: Apply SOPS-encrypted cluster secrets (clusters/noble/secrets/*.yaml)
ansible.builtin.shell: |
set -euo pipefail
shopt -s nullglob
for f in "{{ noble_repo_root }}/clusters/noble/secrets"/*.yaml; do
sops -d "$f" | kubectl apply -f -
done
args:
executable: /bin/bash
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
changed_when: true
# vault-k8s patches webhook CA after install; Helm 3/4 SSA then conflicts on upgrade. Removing the MWC lets Helm re-apply cleanly; injector repopulates caBundle.
- name: Delete Vault agent injector MutatingWebhookConfiguration before Helm (avoids caBundle field conflict)
ansible.builtin.command:
argv:
- kubectl
- delete
- mutatingwebhookconfiguration
- vault-agent-injector-cfg
- --ignore-not-found
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
register: noble_vault_mwc_delete
when: noble_vault_delete_injector_webhook_before_helm | default(true) | bool
changed_when: "'deleted' in (noble_vault_mwc_delete.stdout | default(''))"
- name: Install Vault
ansible.builtin.command:
argv:
- helm
- upgrade
- --install
- vault
- hashicorp/vault
- --namespace
- vault
- --version
- "0.32.0"
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/vault/values.yaml"
- --wait
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
HELM_SERVER_SIDE_APPLY: "false"
SOPS_AGE_KEY_FILE: "{{ noble_sops_age_key_file }}"
when:
- noble_apply_sops_secrets | default(true) | bool
- noble_sops_age_key_stat.stat.exists
changed_when: true
- name: Install kube-prometheus-stack

View File

@@ -1,27 +1,15 @@
---
- name: Vault — manual steps (not automated)
- name: SOPS secrets (workstation)
ansible.builtin.debug:
msg: |
1. kubectl -n vault get pods (wait for Running)
2. kubectl -n vault exec -it vault-0 -- vault operator init (once; save keys)
3. Unseal per clusters/noble/bootstrap/vault/README.md
4. ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
5. kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
- name: Optional — apply Vault ClusterSecretStore for External Secrets
ansible.builtin.command:
argv:
- kubectl
- apply
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
when: noble_apply_vault_cluster_secret_store | default(false) | bool
changed_when: true
Encrypted Kubernetes Secrets live under clusters/noble/secrets/ (Mozilla SOPS + age).
Private key: age-key.txt at repo root (gitignored). See clusters/noble/secrets/README.md
and .sops.yaml. noble.yml decrypt-applies these when age-key.txt exists.
- name: Argo CD optional root Application (empty app-of-apps)
ansible.builtin.debug:
msg: >-
Optional: kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
after editing repoURL. Core workloads are not synced by Argo — see clusters/noble/apps/README.md
App-of-apps: noble.yml applies root-application.yaml when noble_argocd_apply_root_application is true;
bootstrap-root-application.yaml when noble_argocd_apply_bootstrap_root_application is true (group_vars/all.yml).
noble-bootstrap-root uses manual sync until you enable automation after the playbook —
clusters/noble/bootstrap/argocd/README.md §5. See clusters/noble/apps/README.md and that README.

View File

@@ -0,0 +1,13 @@
---
# **noble_velero_install** is in **ansible/group_vars/all.yml**. Override S3 fields via extra-vars or group_vars.
noble_velero_chart_version: "12.0.0"
noble_velero_s3_bucket: ""
noble_velero_s3_url: ""
noble_velero_s3_region: "us-east-1"
noble_velero_s3_force_path_style: "true"
noble_velero_s3_prefix: ""
# Optional — if unset, Ansible expects Secret **velero/velero-cloud-credentials** (key **cloud**) to exist.
noble_velero_aws_access_key_id: ""
noble_velero_aws_secret_access_key: ""

View File

@@ -0,0 +1,68 @@
---
# See repository **.env.sample** — copy to **.env** (gitignored).
- name: Stat repository .env for Velero
ansible.builtin.stat:
path: "{{ noble_repo_root }}/.env"
register: noble_deploy_env_file
changed_when: false
- name: Load NOBLE_VELERO_S3_BUCKET from .env when unset
ansible.builtin.shell: |
set -a
. "{{ noble_repo_root }}/.env"
set +a
echo "${NOBLE_VELERO_S3_BUCKET:-}"
register: noble_velero_s3_bucket_from_env
when:
- noble_deploy_env_file.stat.exists | default(false)
- noble_velero_s3_bucket | default('') | length == 0
changed_when: false
- name: Apply NOBLE_VELERO_S3_BUCKET from .env
ansible.builtin.set_fact:
noble_velero_s3_bucket: "{{ noble_velero_s3_bucket_from_env.stdout | trim }}"
when:
- noble_velero_s3_bucket_from_env is defined
- (noble_velero_s3_bucket_from_env.stdout | default('') | trim | length) > 0
- name: Load NOBLE_VELERO_S3_URL from .env when unset
ansible.builtin.shell: |
set -a
. "{{ noble_repo_root }}/.env"
set +a
echo "${NOBLE_VELERO_S3_URL:-}"
register: noble_velero_s3_url_from_env
when:
- noble_deploy_env_file.stat.exists | default(false)
- noble_velero_s3_url | default('') | length == 0
changed_when: false
- name: Apply NOBLE_VELERO_S3_URL from .env
ansible.builtin.set_fact:
noble_velero_s3_url: "{{ noble_velero_s3_url_from_env.stdout | trim }}"
when:
- noble_velero_s3_url_from_env is defined
- (noble_velero_s3_url_from_env.stdout | default('') | trim | length) > 0
- name: Create velero-cloud-credentials from .env when keys present
ansible.builtin.shell: |
set -euo pipefail
set -a
. "{{ noble_repo_root }}/.env"
set +a
if [ -z "${NOBLE_VELERO_AWS_ACCESS_KEY_ID:-}" ] || [ -z "${NOBLE_VELERO_AWS_SECRET_ACCESS_KEY:-}" ]; then
echo SKIP
exit 0
fi
CLOUD="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' \
"${NOBLE_VELERO_AWS_ACCESS_KEY_ID}" "${NOBLE_VELERO_AWS_SECRET_ACCESS_KEY}")"
kubectl -n velero create secret generic velero-cloud-credentials \
--from-literal=cloud="${CLOUD}" \
--dry-run=client -o yaml | kubectl apply -f -
echo APPLIED
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
when: noble_deploy_env_file.stat.exists | default(false)
no_log: true
register: noble_velero_secret_from_env
changed_when: "'APPLIED' in (noble_velero_secret_from_env.stdout | default(''))"

View File

@@ -0,0 +1,85 @@
---
# Velero — S3 backup target + built-in CSI snapshots (Longhorn: label VolumeSnapshotClass per README).
- name: Apply velero namespace
ansible.builtin.command:
argv:
- kubectl
- apply
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/velero/namespace.yaml"
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
when: noble_velero_install | default(false) | bool
changed_when: true
- name: Include Velero settings from repository .env (S3 bucket, URL, credentials)
ansible.builtin.include_tasks: from_env.yml
when: noble_velero_install | default(false) | bool
- name: Require S3 bucket and endpoint for Velero
ansible.builtin.assert:
that:
- noble_velero_s3_bucket | default('') | length > 0
- noble_velero_s3_url | default('') | length > 0
fail_msg: >-
Set NOBLE_VELERO_S3_BUCKET and NOBLE_VELERO_S3_URL in .env, or noble_velero_s3_bucket / noble_velero_s3_url
(e.g. -e ...), or group_vars when noble_velero_install is true.
when: noble_velero_install | default(false) | bool
- name: Create velero-cloud-credentials from Ansible vars
ansible.builtin.shell: |
set -euo pipefail
CLOUD="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' \
"${AWS_ACCESS_KEY_ID}" "${AWS_SECRET_ACCESS_KEY}")"
kubectl -n velero create secret generic velero-cloud-credentials \
--from-literal=cloud="${CLOUD}" \
--dry-run=client -o yaml | kubectl apply -f -
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
AWS_ACCESS_KEY_ID: "{{ noble_velero_aws_access_key_id }}"
AWS_SECRET_ACCESS_KEY: "{{ noble_velero_aws_secret_access_key }}"
when:
- noble_velero_install | default(false) | bool
- noble_velero_aws_access_key_id | default('') | length > 0
- noble_velero_aws_secret_access_key | default('') | length > 0
no_log: true
changed_when: true
- name: Check velero-cloud-credentials Secret
ansible.builtin.command:
argv:
- kubectl
- -n
- velero
- get
- secret
- velero-cloud-credentials
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
register: noble_velero_secret_check
failed_when: false
changed_when: false
when: noble_velero_install | default(false) | bool
- name: Require velero-cloud-credentials before Helm
ansible.builtin.assert:
that:
- noble_velero_secret_check.rc == 0
fail_msg: >-
Velero needs Secret velero/velero-cloud-credentials (key cloud). Set NOBLE_VELERO_AWS_ACCESS_KEY_ID and
NOBLE_VELERO_AWS_SECRET_ACCESS_KEY in .env, or noble_velero_aws_* extra-vars, or create the Secret manually
(see clusters/noble/bootstrap/velero/README.md).
when: noble_velero_install | default(false) | bool
- name: Optional object prefix argv for Helm
ansible.builtin.set_fact:
noble_velero_helm_prefix_argv: "{{ ['--set-string', 'configuration.backupStorageLocation[0].prefix=' ~ (noble_velero_s3_prefix | default(''))] if (noble_velero_s3_prefix | default('') | length > 0) else [] }}"
when: noble_velero_install | default(false) | bool
- name: Install Velero
ansible.builtin.command:
argv: "{{ ['helm', 'upgrade', '--install', 'velero', 'vmware-tanzu/velero', '--namespace', 'velero', '--version', noble_velero_chart_version, '-f', noble_repo_root ~ '/clusters/noble/bootstrap/velero/values.yaml', '--set-string', 'configuration.backupStorageLocation[0].bucket=' ~ noble_velero_s3_bucket, '--set-string', 'configuration.backupStorageLocation[0].config.s3Url=' ~ noble_velero_s3_url, '--set-string', 'configuration.backupStorageLocation[0].config.region=' ~ noble_velero_s3_region, '--set-string', 'configuration.backupStorageLocation[0].config.s3ForcePathStyle=' ~ noble_velero_s3_force_path_style] + (noble_velero_helm_prefix_argv | default([])) + ['--wait'] }}"
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
when: noble_velero_install | default(false) | bool
changed_when: true

View File

@@ -0,0 +1,14 @@
---
proxmox_repo_debian_codename: "{{ ansible_facts['distribution_release'] | default('bookworm') }}"
proxmox_repo_disable_enterprise: true
proxmox_repo_disable_ceph_enterprise: true
proxmox_repo_enable_pve_no_subscription: true
proxmox_repo_enable_ceph_no_subscription: false
proxmox_no_subscription_notice_disable: true
proxmox_widget_toolkit_file: /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
# Bootstrap root SSH keys from the control machine so subsequent runs can use key auth.
proxmox_root_authorized_key_files:
- "{{ lookup('env', 'HOME') }}/.ssh/id_ed25519.pub"
- "{{ lookup('env', 'HOME') }}/.ssh/ansible.pub"

View File

@@ -0,0 +1,5 @@
---
- name: Restart pveproxy
ansible.builtin.service:
name: pveproxy
state: restarted

View File

@@ -0,0 +1,100 @@
---
- name: Check configured local public key files
ansible.builtin.stat:
path: "{{ item }}"
register: proxmox_root_pubkey_stats
loop: "{{ proxmox_root_authorized_key_files }}"
delegate_to: localhost
become: false
- name: Fail when a configured local public key file is missing
ansible.builtin.fail:
msg: "Configured key file does not exist on the control host: {{ item.item }}"
when: not item.stat.exists
loop: "{{ proxmox_root_pubkey_stats.results }}"
delegate_to: localhost
become: false
- name: Ensure root authorized_keys contains configured public keys
ansible.posix.authorized_key:
user: root
state: present
key: "{{ lookup('ansible.builtin.file', item) }}"
manage_dir: true
loop: "{{ proxmox_root_authorized_key_files }}"
- name: Remove enterprise repository lines from /etc/apt/sources.list
ansible.builtin.lineinfile:
path: /etc/apt/sources.list
regexp: ".*enterprise\\.proxmox\\.com.*"
state: absent
when:
- proxmox_repo_disable_enterprise | bool or proxmox_repo_disable_ceph_enterprise | bool
failed_when: false
- name: Find apt source files that contain Proxmox enterprise repositories
ansible.builtin.find:
paths: /etc/apt/sources.list.d
file_type: file
patterns:
- "*.list"
- "*.sources"
contains: "enterprise\\.proxmox\\.com"
use_regex: true
register: proxmox_enterprise_repo_files
when:
- proxmox_repo_disable_enterprise | bool or proxmox_repo_disable_ceph_enterprise | bool
- name: Remove enterprise repository lines from apt source files
ansible.builtin.lineinfile:
path: "{{ item.path }}"
regexp: ".*enterprise\\.proxmox\\.com.*"
state: absent
loop: "{{ proxmox_enterprise_repo_files.files | default([]) }}"
when:
- proxmox_repo_disable_enterprise | bool or proxmox_repo_disable_ceph_enterprise | bool
- name: Find apt source files that already contain pve-no-subscription
ansible.builtin.find:
paths: /etc/apt/sources.list.d
file_type: file
patterns:
- "*.list"
- "*.sources"
contains: "pve-no-subscription"
use_regex: false
register: proxmox_no_sub_repo_files
when: proxmox_repo_enable_pve_no_subscription | bool
- name: Ensure Proxmox no-subscription repository is configured when absent
ansible.builtin.copy:
dest: /etc/apt/sources.list.d/pve-no-subscription.list
content: "deb http://download.proxmox.com/debian/pve {{ proxmox_repo_debian_codename }} pve-no-subscription\n"
mode: "0644"
when:
- proxmox_repo_enable_pve_no_subscription | bool
- (proxmox_no_sub_repo_files.matched | default(0) | int) == 0
- name: Remove duplicate pve-no-subscription.list when another source already provides it
ansible.builtin.file:
path: /etc/apt/sources.list.d/pve-no-subscription.list
state: absent
when:
- proxmox_repo_enable_pve_no_subscription | bool
- (proxmox_no_sub_repo_files.files | default([]) | map(attribute='path') | list | select('ne', '/etc/apt/sources.list.d/pve-no-subscription.list') | list | length) > 0
- name: Ensure Ceph no-subscription repository is configured
ansible.builtin.copy:
dest: /etc/apt/sources.list.d/ceph-no-subscription.list
content: "deb http://download.proxmox.com/debian/ceph-{{ proxmox_repo_debian_codename }} {{ proxmox_repo_debian_codename }} no-subscription\n"
mode: "0644"
when: proxmox_repo_enable_ceph_no_subscription | bool
- name: Disable no-subscription pop-up in Proxmox UI
ansible.builtin.replace:
path: "{{ proxmox_widget_toolkit_file }}"
regexp: "if \\(data\\.status !== 'Active'\\)"
replace: "if (false)"
backup: true
when: proxmox_no_subscription_notice_disable | bool
notify: Restart pveproxy

View File

@@ -0,0 +1,7 @@
---
proxmox_cluster_enabled: true
proxmox_cluster_name: pve-cluster
proxmox_cluster_master: ""
proxmox_cluster_master_ip: ""
proxmox_cluster_force: false
proxmox_cluster_master_root_password: ""

View File

@@ -0,0 +1,63 @@
---
- name: Skip cluster role when disabled
ansible.builtin.meta: end_host
when: not (proxmox_cluster_enabled | bool)
- name: Check whether corosync cluster config exists
ansible.builtin.stat:
path: /etc/pve/corosync.conf
register: proxmox_cluster_corosync_conf
- name: Set effective Proxmox cluster master
ansible.builtin.set_fact:
proxmox_cluster_master_effective: "{{ proxmox_cluster_master | default(groups['proxmox_hosts'][0], true) }}"
- name: Set effective Proxmox cluster master IP
ansible.builtin.set_fact:
proxmox_cluster_master_ip_effective: >-
{{
proxmox_cluster_master_ip
| default(hostvars[proxmox_cluster_master_effective].ansible_host
| default(proxmox_cluster_master_effective), true)
}}
- name: Create cluster on designated master
ansible.builtin.command:
cmd: "pvecm create {{ proxmox_cluster_name }}"
when:
- inventory_hostname == proxmox_cluster_master_effective
- not proxmox_cluster_corosync_conf.stat.exists
- name: Ensure python3-pexpect is installed for password-based cluster join
ansible.builtin.apt:
name: python3-pexpect
state: present
update_cache: true
when:
- inventory_hostname != proxmox_cluster_master_effective
- not proxmox_cluster_corosync_conf.stat.exists
- proxmox_cluster_master_root_password | length > 0
- name: Join node to existing cluster (password provided)
ansible.builtin.expect:
command: >-
pvecm add {{ proxmox_cluster_master_ip_effective }}
{% if proxmox_cluster_force | bool %}--force{% endif %}
responses:
"Please enter superuser \\(root\\) password for '.*':": "{{ proxmox_cluster_master_root_password }}"
"password:": "{{ proxmox_cluster_master_root_password }}"
no_log: true
when:
- inventory_hostname != proxmox_cluster_master_effective
- not proxmox_cluster_corosync_conf.stat.exists
- proxmox_cluster_master_root_password | length > 0
- name: Join node to existing cluster (SSH trust/no prompt)
ansible.builtin.command:
cmd: >-
pvecm add {{ proxmox_cluster_master_ip_effective }}
{% if proxmox_cluster_force | bool %}--force{% endif %}
when:
- inventory_hostname != proxmox_cluster_master_effective
- not proxmox_cluster_corosync_conf.stat.exists
- proxmox_cluster_master_root_password | length == 0

View File

@@ -0,0 +1,6 @@
---
proxmox_upgrade_apt_cache_valid_time: 3600
proxmox_upgrade_autoremove: true
proxmox_upgrade_autoclean: true
proxmox_upgrade_reboot_if_required: true
proxmox_upgrade_reboot_timeout: 1800

View File

@@ -0,0 +1,30 @@
---
- name: Refresh apt cache
ansible.builtin.apt:
update_cache: true
cache_valid_time: "{{ proxmox_upgrade_apt_cache_valid_time }}"
- name: Upgrade Proxmox host packages
ansible.builtin.apt:
upgrade: dist
- name: Remove orphaned packages
ansible.builtin.apt:
autoremove: "{{ proxmox_upgrade_autoremove }}"
- name: Clean apt package cache
ansible.builtin.apt:
autoclean: "{{ proxmox_upgrade_autoclean }}"
- name: Check if reboot is required
ansible.builtin.stat:
path: /var/run/reboot-required
register: proxmox_reboot_required_file
- name: Reboot when required by package upgrades
ansible.builtin.reboot:
reboot_timeout: "{{ proxmox_upgrade_reboot_timeout }}"
msg: "Reboot initiated by Ansible Proxmox maintenance playbook"
when:
- proxmox_upgrade_reboot_if_required | bool
- proxmox_reboot_required_file.stat.exists | default(false)

BIN
branding/nikflix/logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

View File

@@ -1,7 +1,7 @@
# Argo CD — optional applications (non-bootstrap)
**Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, Vault, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here.
**Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, SOPS secrets path, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here.
**`noble-root`** (`clusters/noble/bootstrap/argocd/root-application.yaml`) points at **`clusters/noble/apps`**. Add **`Application`** manifests (and optional **`AppProject`** definitions) under this directory only for workloads that are additive and do not subsume the Ansible-managed platform.
**`noble-root`** (`clusters/noble/bootstrap/argocd/root-application.yaml`) points at **`clusters/noble/apps`**. Add **`Application`** manifests (and optional **`AppProject`** definitions) under this directory only for workloads that are additive and do not subsume the core platform.
For an app-of-apps pattern, use a second-level **`Application`** that syncs a subdirectory (for example **`optional/`**) containing leaf **`Application`** resources.
Bootstrap kustomize (namespaces, static YAML, leaf **`Application`**s) lives in **`clusters/noble/bootstrap/`** and is tracked by **`noble-bootstrap-root`** — enable automated sync for that app only after **`noble.yml`** completes (**`clusters/noble/bootstrap/argocd/README.md`** §5). Put Helm **`Application`** migrations under **`clusters/noble/bootstrap/argocd/app-of-apps/`**.

View File

@@ -0,0 +1,32 @@
# Argo CD — optional [Homepage](https://gethomepage.dev/) dashboard (Helm from [jameswynn.github.io/helm-charts](https://jameswynn.github.io/helm-charts/)).
# Values: **`./values.yaml`** (multi-source **`$values`** ref).
#
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: homepage
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io/background
spec:
project: default
sources:
- repoURL: https://jameswynn.github.io/helm-charts
chart: homepage
targetRevision: 2.1.0
helm:
releaseName: homepage
valueFiles:
- $values/clusters/noble/apps/homepage/values.yaml
- repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
targetRevision: HEAD
ref: values
destination:
server: https://kubernetes.default.svc
namespace: homepage
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View File

@@ -0,0 +1,122 @@
# Homepage — [gethomepage/homepage](https://github.com/gethomepage/homepage) via [jameswynn/homepage](https://github.com/jameswynn/helm-charts) Helm chart.
# Ingress: Traefik + cert-manager (same pattern as `clusters/noble/bootstrap/headlamp/values.yaml`).
# Service links match **`ansible/roles/noble_landing_urls/defaults/main.yml`** (`noble_lab_ui_entries`).
# **Velero** has no in-cluster web UI — tile links to upstream docs (no **siteMonitor**).
#
# **`siteMonitor`** runs **server-side** in the Homepage pod (see `gethomepage/homepage` `siteMonitor.js`).
# Public FQDNs like **`*.apps.noble.lab.pcenicni.dev`** often do **not** resolve inside the cluster
# (split-horizon / LAN DNS only) → `ENOTFOUND` / HTTP **500** in the monitor. Use **in-cluster Service**
# URLs for **`siteMonitor`** only; **`href`** stays the human-facing ingress URL.
#
# **Prometheus widget** also resolves from the pod — use the real **Service** name (Helm may truncate to
# 63 chars — this repos generated UI list uses **`kube-prometheus-kube-prome-prometheus`**).
# Verify: `kubectl -n monitoring get svc | grep -E 'prometheus|alertmanager|grafana'`.
#
image:
repository: ghcr.io/gethomepage/homepage
tag: v1.2.0
enableRbac: true
serviceAccount:
create: true
ingress:
main:
enabled: true
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: homepage.apps.noble.lab.pcenicni.dev
paths:
- path: /
pathType: Prefix
tls:
- hosts:
- homepage.apps.noble.lab.pcenicni.dev
secretName: homepage-apps-noble-tls
env:
- name: HOMEPAGE_ALLOWED_HOSTS
value: homepage.apps.noble.lab.pcenicni.dev
config:
bookmarks: []
services:
- Noble Lab:
- Argo CD:
icon: si-argocd
href: https://argo.apps.noble.lab.pcenicni.dev
siteMonitor: http://argocd-server.argocd.svc.cluster.local:80
description: GitOps UI (sync, apps, repos)
- Grafana:
icon: si-grafana
href: https://grafana.apps.noble.lab.pcenicni.dev
siteMonitor: http://kube-prometheus-grafana.monitoring.svc.cluster.local:80
description: Dashboards, Loki explore (logs)
- Prometheus:
icon: si-prometheus
href: https://prometheus.apps.noble.lab.pcenicni.dev
siteMonitor: http://kube-prometheus-kube-prome-prometheus.monitoring.svc.cluster.local:9090
description: Prometheus UI (queries, targets) — lab; protect in production
widget:
type: prometheus
url: http://kube-prometheus-kube-prome-prometheus.monitoring.svc.cluster.local:9090
fields: ["targets_up", "targets_down", "targets_total"]
- Alertmanager:
icon: alertmanager.png
href: https://alertmanager.apps.noble.lab.pcenicni.dev
siteMonitor: http://kube-prometheus-kube-prome-alertmanager.monitoring.svc.cluster.local:9093
description: Alertmanager UI (silences, status)
- Headlamp:
icon: mdi-kubernetes
href: https://headlamp.apps.noble.lab.pcenicni.dev
siteMonitor: http://headlamp.headlamp.svc.cluster.local:80
description: Kubernetes UI (cluster resources)
- Longhorn:
icon: longhorn.png
href: https://longhorn.apps.noble.lab.pcenicni.dev
siteMonitor: http://longhorn-frontend.longhorn-system.svc.cluster.local:80
description: Storage volumes, nodes, backups
- Velero:
icon: mdi-backup-restore
href: https://velero.io/docs/
description: Cluster backups — no in-cluster web UI; use velero CLI or kubectl (docs)
widgets:
- datetime:
text_size: xl
format:
dateStyle: medium
timeStyle: short
- kubernetes:
cluster:
show: true
cpu: true
memory: true
showLabel: true
label: Cluster
nodes:
show: true
cpu: true
memory: true
showLabel: true
- search:
provider: duckduckgo
target: _blank
kubernetes:
mode: cluster
settingsString: |
title: Noble Lab
description: Homelab services — in-cluster uptime checks, cluster resources, Prometheus targets
theme: dark
color: slate
headerStyle: boxedWidgets
statusStyle: dot
iconStyle: theme
fullWidth: true
useEqualHeights: true
layout:
Noble Lab:
style: row
columns: 4

View File

@@ -3,4 +3,5 @@
# Helm value files for those apps can live in subdirectories here (for example **./homepage/values.yaml**).
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources: []
resources:
- homepage/application.yaml

View File

@@ -50,21 +50,56 @@ helm upgrade --install argocd argo/argo-cd -n argocd --create-namespace \
Use **Settings → Repositories** in the UI, or `argocd repo add` / a `Secret` of type `repository`.
## 4. App-of-apps (optional GitOps only)
## 4. App-of-apps (GitOps)
Bootstrap **platform** workloads (CNI, ingress, cert-manager, Kyverno, observability, Vault, etc.) are installed by
**`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not by Argo. **`clusters/noble/apps/kustomization.yaml`** is empty by default.
**Ansible** (`ansible/playbooks/noble.yml`) performs the **initial** install: Helm releases and **`kubectl apply -k clusters/noble/bootstrap`**. **Argo** then tracks the same git paths for ongoing reconciliation.
1. Edit **`root-application.yaml`**: set **`repoURL`** and **`targetRevision`** to this repository. The **`resources-finalizer.argocd.argoproj.io/background`** finalizer uses Argos path-qualified form so **`kubectl apply`** does not warn about finalizer names.
2. When you want Argo to manage specific apps, add **`Application`** manifests under **`clusters/noble/apps/`** (see **`clusters/noble/apps/README.md`**).
3. Apply the root:
1. Edit **`root-application.yaml`** and **`bootstrap-root-application.yaml`**: set **`repoURL`** and **`targetRevision`**. The **`resources-finalizer.argocd.argoproj.io/background`** finalizer uses Argos path-qualified form so **`kubectl apply`** does not warn about finalizer names.
2. Optional add-on apps: add **`Application`** manifests under **`clusters/noble/apps/`** (see **`clusters/noble/apps/README.md`**).
3. **Bootstrap kustomize** (namespaces, datasource, leaf **`Application`**s under **`argocd/app-of-apps/`**, etc.): **`noble-bootstrap-root`** syncs **`clusters/noble/bootstrap`**. It is created with **manual** sync only so Argo does not apply changes while **`noble.yml`** is still running.
**`ansible/playbooks/noble.yml`** (role **`noble_argocd`**) applies both roots when **`noble_argocd_apply_root_application`** / **`noble_argocd_apply_bootstrap_root_application`** are true in **`ansible/group_vars/all.yml`**.
```bash
kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
kubectl apply -f clusters/noble/bootstrap/argocd/bootstrap-root-application.yaml
```
If you migrated from GitOps-managed **`noble-platform`** / **`noble-kyverno`**, delete stale **`Application`** objects on
the cluster (see **`clusters/noble/apps/README.md`**) then re-apply the root.
If you migrated from older GitOps **`Application`** names, delete stale **`Application`** objects on the cluster (see **`clusters/noble/apps/README.md`**) then re-apply the roots.
## 5. After Ansible: enable automated sync for **noble-bootstrap-root**
Do this only after **`ansible-playbook playbooks/noble.yml`** has finished successfully (including **`noble_platform`** `kubectl apply -k` and any Helm stages you rely on). Until then, leave **manual** sync so Argo does not fight the playbook.
**Required steps**
1. Confirm the cluster matches git for kustomize output (optional): `kubectl kustomize clusters/noble/bootstrap | kubectl diff -f -` or inspect resources in the UI.
2. Register the git repo in Argo if you have not already (**§3**).
3. **Refresh** the app so Argo compares **`clusters/noble/bootstrap`** to the cluster: Argo UI → **noble-bootstrap-root** → **Refresh**, or:
```bash
argocd app get noble-bootstrap-root --refresh
```
4. **Enable automated sync** (prune + self-heal), preserving **`CreateNamespace`**, using any one of:
**kubectl**
```bash
kubectl patch application noble-bootstrap-root -n argocd --type merge -p '{"spec":{"syncPolicy":{"automated":{"prune":true,"selfHeal":true},"syncOptions":["CreateNamespace=true"]}}}'
```
**argocd** CLI (logged in)
```bash
argocd app set noble-bootstrap-root --sync-policy automated --auto-prune --self-heal
```
**UI:** open **noble-bootstrap-root** → **App Details** → enable **AUTO-SYNC** (and **Prune** / **Self Heal** if shown).
5. Trigger a sync if the app does not go green immediately: **Sync** in the UI, or `argocd app sync noble-bootstrap-root`.
After this, **git** is the source of truth for everything under **`clusters/noble/bootstrap/kustomization.yaml`** (including **`argocd/app-of-apps/`**). Helm-managed platform components remain whatever Ansible last installed until you model them as Argo **`Application`**s under **`app-of-apps/`** and stop installing them from Ansible.
## Versions

View File

@@ -3,8 +3,10 @@
# 1. Set spec.source.repoURL (and targetRevision — **HEAD** tracks the remote default branch) to this repo.
# 2. kubectl apply -f clusters/noble/bootstrap/argocd/root-application.yaml
#
# **clusters/noble/apps** holds optional **Application** manifests. Core platform is installed by
# **ansible/playbooks/noble.yml** from **clusters/noble/bootstrap/**.
# **clusters/noble/apps** holds optional **Application** manifests. Core platform Helm + kustomize is
# installed by **ansible/playbooks/noble.yml** from **clusters/noble/bootstrap/**. **bootstrap-root-application.yaml**
# registers **noble-bootstrap-root** for the same kustomize tree (**manual** sync until you enable
# automation after the playbook — see **README.md** §5).
#
apiVersion: argoproj.io/v1alpha1
kind: Application

View File

@@ -0,0 +1,16 @@
# CSI Volume Snapshot (external-snapshotter)
Installs the **Volume Snapshot** CRDs and the **snapshot-controller** so CSI drivers (e.g. **Longhorn**) and **Velero** can use `VolumeSnapshot` / `VolumeSnapshotContent` / `VolumeSnapshotClass`.
- Upstream: [kubernetes-csi/external-snapshotter](https://github.com/kubernetes-csi/external-snapshotter) **v8.5.0**
- **Not** the per-driver **csi-snapshotter** sidecar — Longhorn ships that with its CSI components.
**Order:** apply **before** relying on volume snapshots (e.g. before or early with **Longhorn**; **Ansible** runs this after **Cilium**, before **metrics-server** / **Longhorn**).
```bash
kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/crd
kubectl apply -k clusters/noble/bootstrap/csi-snapshot-controller/controller
kubectl -n kube-system rollout status deploy/snapshot-controller --timeout=120s
```
After this, create or label a **VolumeSnapshotClass** for Longhorn (`velero.io/csi-volumesnapshot-class: "true"`) per `clusters/noble/bootstrap/velero/README.md`.

View File

@@ -0,0 +1,8 @@
# Snapshot controller — **kube-system** (upstream default).
# Image tag should match the external-snapshotter release family (see setup-snapshot-controller.yaml in that tag).
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kube-system
resources:
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml

View File

@@ -0,0 +1,9 @@
# kubernetes-csi/external-snapshotter — Volume Snapshot GA CRDs only (no VolumeGroupSnapshot).
# Pin **ref** when bumping; keep in sync with **controller** image below.
# https://github.com/kubernetes-csi/external-snapshotter/tree/v8.5.0/client/config/crd
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
- https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/v8.5.0/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml

View File

@@ -1,60 +0,0 @@
# External Secrets Operator (noble)
Syncs secrets from external systems into Kubernetes **Secret** objects via **ExternalSecret** / **ClusterExternalSecret** CRDs.
- **Chart:** `external-secrets/external-secrets` **2.2.0** (app **v2.2.0**)
- **Namespace:** `external-secrets`
- **Helm release name:** `external-secrets` (matches the operator **ServiceAccount** name `external-secrets`)
## Install
```bash
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
--version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
```
Verify:
```bash
kubectl -n external-secrets get deploy,pods
kubectl get crd | grep external-secrets
```
## Vault `ClusterSecretStore` (after Vault is deployed)
The checklist expects a **Vault**-backed store. Install Vault first (`talos/CLUSTER-BUILD.md` Phase E — Vault on Longhorn + auto-unseal), then:
1. Enable **KV v2** secrets engine and **Kubernetes** auth in Vault; create a **role** (e.g. `external-secrets`) that maps the clusters **`external-secrets` / `external-secrets`** service account to a policy that can read the paths you need.
2. Copy **`examples/vault-cluster-secret-store.yaml`**, set **`spec.provider.vault.server`** to your Vault URL. This repos Vault Helm values use **HTTP** on port **8200** (`global.tlsDisable: true`): **`http://vault.vault.svc.cluster.local:8200`**. Use **`https://`** if you enable TLS on the Vault listener.
3. If Vault uses a **private TLS CA**, configure **`caProvider`** or **`caBundle`** on the Vault provider — see [HashiCorp Vault provider](https://external-secrets.io/latest/provider/hashicorp-vault/). Do not commit private CA material to public git unless intended.
4. Apply: **`kubectl apply -f …/vault-cluster-secret-store.yaml`**
5. Confirm the store is ready: **`kubectl describe clustersecretstore vault`**
Example **ExternalSecret** (after the store is healthy):
```yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: demo
namespace: default
spec:
refreshInterval: 1h
secretStoreRef:
name: vault
kind: ClusterSecretStore
target:
name: demo-synced
data:
- secretKey: password
remoteRef:
key: secret/data/myapp
property: password
```
## Upgrades
Pin the chart version in `values.yaml` header comments; run the same **`helm upgrade --install`** with the new **`--version`** after reviewing [release notes](https://github.com/external-secrets/external-secrets/releases).

View File

@@ -1,31 +0,0 @@
# ClusterSecretStore for HashiCorp Vault (KV v2) using Kubernetes auth.
#
# Do not apply until Vault is running, reachable from the cluster, and configured with:
# - Kubernetes auth at mountPath (default: kubernetes)
# - A role (below: external-secrets) bound to this service account:
# name: external-secrets
# namespace: external-secrets
# - A policy allowing read on the KV path used below (e.g. secret/data/* for path "secret")
#
# Adjust server, mountPath, role, and path to match your Vault deployment. If Vault uses TLS
# with a private CA, set provider.vault.caProvider or caBundle (see README).
#
# kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
---
apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
name: vault
spec:
provider:
vault:
server: "http://vault.vault.svc.cluster.local:8200"
path: secret
version: v2
auth:
kubernetes:
mountPath: kubernetes
role: external-secrets
serviceAccountRef:
name: external-secrets
namespace: external-secrets

View File

@@ -1,5 +0,0 @@
# External Secrets Operator — apply before Helm.
apiVersion: v1
kind: Namespace
metadata:
name: external-secrets

View File

@@ -1,10 +0,0 @@
# External Secrets Operator — noble
#
# helm repo add external-secrets https://charts.external-secrets.io
# helm repo update
# kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
# helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
# --version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
#
# CRDs are installed by the chart (installCRDs: true). Vault ClusterSecretStore: see README + examples/.
commonLabels: {}

View File

@@ -1,6 +1,8 @@
# Ansible bootstrap: plain Kustomize (namespaces + extra YAML). Helm installs are driven by
# **ansible/playbooks/noble.yml** (role **noble_platform**) — avoids **kustomize --enable-helm** in-repo.
# Optional GitOps workloads live under **../apps/** (Argo **noble-root**).
# Optional GitOps: **../apps/** (Argo **noble-root**); leaf **Application**s under **argocd/app-of-apps/**.
# **noble-bootstrap-root** (Argo) uses this same path — enable automated sync only after **noble.yml**
# completes (see **argocd/README.md** §5).
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
@@ -8,11 +10,10 @@ resources:
- kube-prometheus-stack/namespace.yaml
- loki/namespace.yaml
- fluent-bit/namespace.yaml
- sealed-secrets/namespace.yaml
- external-secrets/namespace.yaml
- vault/namespace.yaml
- newt/namespace.yaml
- kyverno/namespace.yaml
- velero/namespace.yaml
- velero/longhorn-volumesnapshotclass.yaml
- headlamp/namespace.yaml
- grafana-loki-datasource/loki-datasource.yaml
- vault/unseal-cronjob.yaml
- vault/cilium-network-policy.yaml
- argocd/app-of-apps

View File

@@ -35,7 +35,6 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
- kube-node-lease
- argocd
- cert-manager
- external-secrets
- headlamp
- kyverno
- logging
@@ -44,9 +43,7 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
- metallb-system
- monitoring
- newt
- sealed-secrets
- traefik
- vault
policyExclude:
disallow-capabilities: *kyverno_exclude_infra

View File

@@ -2,26 +2,24 @@
This is the **primary** automation path for **public** hostnames to workloads in this cluster (it **replaces** in-cluster ExternalDNS). [Newt](https://github.com/fosrl/newt) is the on-prem agent that connects your cluster to a **Pangolin** site (WireGuard tunnel). The [Fossorial Helm chart](https://github.com/fosrl/helm-charts) deploys one or more instances.
**Secrets:** Never commit endpoint, Newt ID, or Newt secret. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret.
**Secrets:** Never commit endpoint, Newt ID, or Newt secret in **plain** YAML. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret.
## 1. Create the Secret
Keys must match `values.yaml` (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`).
### Option A — Sealed Secret (safe for GitOps)
### Option A — SOPS (safe for GitOps)
With the [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) controller installed (`clusters/noble/bootstrap/sealed-secrets/`), generate a `SealedSecret` from your workstation (rotate credentials in Pangolin first if they were exposed):
Encrypt a normal **`Secret`** with [Mozilla SOPS](https://github.com/getsops/sops) and **age** (see **`clusters/noble/secrets/README.md`** and **`.sops.yaml`**). The repo includes an encrypted example at **`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`** — edit with `sops` after exporting **`SOPS_AGE_KEY_FILE`** to your **`age-key.txt`**, or create a new file and encrypt it.
```bash
chmod +x clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
export PANGOLIN_ENDPOINT='https://pangolin.pcenicni.dev'
export NEWT_ID='YOUR_NEWT_ID'
export NEWT_SECRET='YOUR_NEWT_SECRET'
./clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
# then:
sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
```
Commit only the `.sealedsecret.yaml` file, not plain `Secret` YAML.
**Ansible** (`noble.yml`) applies all **`clusters/noble/secrets/*.yaml`** automatically when **`age-key.txt`** exists at the repo root.
### Option B — Imperative Secret (not in git)

View File

@@ -1,50 +0,0 @@
# Sealed Secrets (noble)
Encrypts `Secret` manifests so they can live in git; the controller decrypts **SealedSecret** resources into **Secret**s in-cluster.
- **Chart:** `sealed-secrets/sealed-secrets` **2.18.4** (app **0.36.1**)
- **Namespace:** `sealed-secrets`
## Install
```bash
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm repo update
kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
--version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
```
## Workstation: `kubeseal`
Install a **kubeseal** build compatible with the controller (match **app** minor, e.g. **0.36.x** for **0.36.1**). Examples:
- **Homebrew:** `brew install kubeseal` (check `kubeseal --version` against the charts `image.tag` in `helm show values`).
- **GitHub releases:** [bitnami-labs/sealed-secrets](https://github.com/bitnami-labs/sealed-secrets/releases)
Fetch the clusters public seal cert (once per kube context):
```bash
kubeseal --fetch-cert > /tmp/noble-sealed-secrets.pem
```
Create a sealed secret from a normal secret manifest:
```bash
kubectl create secret generic example --from-literal=foo=bar --dry-run=client -o yaml \
| kubeseal --cert /tmp/noble-sealed-secrets.pem -o yaml > example-sealedsecret.yaml
```
Commit `example-sealedsecret.yaml`; apply it with `kubectl apply -f`. The controller creates the **Secret** in the same namespace as the **SealedSecret**.
**Noble example:** `examples/kubeseal-newt-pangolin-auth.sh` (Newt / Pangolin tunnel credentials).
## Backup the sealing key
If the controllers private key is lost, existing sealed files cannot be decrypted on a new cluster. Back up the key secret after install:
```bash
kubectl get secret -n sealed-secrets -l sealedsecrets.bitnami.com/sealed-secrets-key=active -o yaml > sealed-secrets-key-backup.yaml
```
Store `sealed-secrets-key-backup.yaml` in a safe offline location (not in public git).

View File

@@ -1,19 +0,0 @@
#!/usr/bin/env bash
# Emit a SealedSecret for newt-pangolin-auth (namespace newt).
# Prerequisites: sealed-secrets controller running; kubeseal client (same minor as controller).
# Rotate Pangolin/Newt credentials in the UI first if they were exposed, then set env vars and run:
#
# export PANGOLIN_ENDPOINT='https://pangolin.example.com'
# export NEWT_ID='...'
# export NEWT_SECRET='...'
# ./kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
# kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
#
set -euo pipefail
kubectl apply -f "$(dirname "$0")/../../newt/namespace.yaml" >/dev/null 2>&1 || true
kubectl -n newt create secret generic newt-pangolin-auth \
--dry-run=client \
--from-literal=PANGOLIN_ENDPOINT="${PANGOLIN_ENDPOINT:?}" \
--from-literal=NEWT_ID="${NEWT_ID:?}" \
--from-literal=NEWT_SECRET="${NEWT_SECRET:?}" \
-o yaml | kubeseal -o yaml

View File

@@ -1,5 +0,0 @@
# Sealed Secrets controller — apply before Helm.
apiVersion: v1
kind: Namespace
metadata:
name: sealed-secrets

View File

@@ -1,18 +0,0 @@
# Sealed Secrets — noble (Git-encrypted Secret workflow)
#
# helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
# helm repo update
# kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
# helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
# --version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
#
# Client: install kubeseal (same minor as controller — see README).
# Defaults are sufficient for the lab; override here if you need key renewal, resources, etc.
#
# GitOps pattern: create Secrets only via SealedSecret (or External Secrets + Vault).
# Example (Newt): clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
# Backup the controller's sealing key: kubectl -n sealed-secrets get secret sealed-secrets-key -o yaml
#
# Talos cluster secrets (bootstrap token, cluster secret, certs) belong in talhelper talsecret /
# SOPS — not Sealed Secrets. See talos/README.md.
commonLabels: {}

View File

@@ -1,162 +0,0 @@
# HashiCorp Vault (noble)
Standalone Vault with **file** storage on a **Longhorn** PVC (`server.dataStorage`). The listener uses **HTTP** (`global.tlsDisable: true`) for in-cluster use; add TLS at the listener when exposing outside the cluster.
- **Chart:** `hashicorp/vault` **0.32.0** (Vault **1.21.2**)
- **Namespace:** `vault`
## Install
```bash
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
helm upgrade --install vault hashicorp/vault -n vault \
--version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
```
Verify:
```bash
kubectl -n vault get pods,pvc,svc
kubectl -n vault exec -i sts/vault -- vault status
```
## Cilium network policy (Phase G)
After **Cilium** is up, optionally restrict HTTP access to the Vault server pods (**TCP 8200**) to **`external-secrets`** and same-namespace clients:
```bash
kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
```
If you add workloads in other namespaces that call Vault, extend **`ingress`** in that manifest.
## Initialize and unseal (first time)
From a workstation with `kubectl` (or `kubectl exec` into any pod with `vault` CLI):
```bash
kubectl -n vault exec -i sts/vault -- vault operator init -key-shares=1 -key-threshold=1
```
**Lab-only:** `-key-shares=1 -key-threshold=1` keeps a single unseal key. For stronger Shamir splits, use more shares and store them safely.
Save the **Unseal Key** and **Root Token** offline. Then unseal once:
```bash
kubectl -n vault exec -i sts/vault -- vault operator unseal
# paste unseal key
```
Or create the Secret used by the optional CronJob and apply it:
```bash
kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
```
The CronJob runs every minute and unseals if Vault is sealed and the Secret is present.
## Auto-unseal note
Vault **OSS** auto-unseal uses cloud KMS (AWS, GCP, Azure, OCI), **Transit** (another Vault), etc. There is no first-class “Kubernetes Secret” seal. This repo uses an optional **CronJob** as a **lab** substitute. Production clusters should use a supported seal backend.
## Kubernetes auth (External Secrets / ClusterSecretStore)
**One-shot:** from the repo root, `export KUBECONFIG=talos/kubeconfig` and `export VAULT_TOKEN=…`, then run **`./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`** (idempotent). Then **`kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** on its own line (shell comments **`# …`** on the same line are parsed as extra `kubectl` args and break `apply`). **`kubectl get clustersecretstore vault`** should show **READY=True** after a few seconds.
Run these **from your workstation** (needs `kubectl`; no local `vault` binary required). Use a **short-lived admin token** or the root token **only in your shell** — do not paste tokens into logs or chat.
**1. Enable the auth method** (skip if already done):
```bash
kubectl -n vault exec -it sts/vault -- sh -c '
export VAULT_ADDR=http://127.0.0.1:8200
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
vault auth enable kubernetes
'
```
**2. Configure `auth/kubernetes`** — the API **issuer** must match the `iss` claim on service account JWTs. With **kube-vip** / a custom API URL, discover it from the cluster (do not assume `kubernetes.default`):
```bash
ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
```
Then apply config **inside** the Vault pod (environment variables are passed in with `env` so quoting stays correct):
```bash
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
export ISSUER REVIEWER CA_B64
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
CA_B64="$CA_B64" \
REVIEWER="$REVIEWER" \
ISSUER="$ISSUER" \
sh -ec '
echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
vault write auth/kubernetes/config \
kubernetes_host="https://kubernetes.default.svc:443" \
kubernetes_ca_cert=@/tmp/k8s-ca.crt \
token_reviewer_jwt="$REVIEWER" \
issuer="$ISSUER"
'
```
**3. KV v2** at path `secret` (skip if already enabled):
```bash
kubectl -n vault exec -it sts/vault -- sh -c '
export VAULT_ADDR=http://127.0.0.1:8200
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
vault secrets enable -path=secret kv-v2
'
```
**4. Policy + role** for the External Secrets operator SA (`external-secrets` / `external-secrets`):
```bash
kubectl -n vault exec -it sts/vault -- sh -c '
export VAULT_ADDR=http://127.0.0.1:8200
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
vault policy write external-secrets - <<EOF
path "secret/data/*" {
capabilities = ["read", "list"]
}
path "secret/metadata/*" {
capabilities = ["read", "list"]
}
EOF
vault write auth/kubernetes/role/external-secrets \
bound_service_account_names=external-secrets \
bound_service_account_namespaces=external-secrets \
policies=external-secrets \
ttl=24h
'
```
**5. Apply** **`clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** if you have not already, then verify:
```bash
kubectl describe clustersecretstore vault
```
See also [Kubernetes auth](https://developer.hashicorp.com/vault/docs/auth/kubernetes#configuration).
## TLS and External Secrets
`values.yaml` disables TLS on the Vault listener. The **`ClusterSecretStore`** example uses **`http://vault.vault.svc.cluster.local:8200`**. If you enable TLS on the listener, switch the URL to **`https://`** and configure **`caBundle`** or **`caProvider`** on the store.
## UI
Port-forward:
```bash
kubectl -n vault port-forward svc/vault-ui 8200:8200
```
Open `http://127.0.0.1:8200` and log in with the root token (rotate for production workflows).

View File

@@ -1,40 +0,0 @@
# CiliumNetworkPolicy — restrict who may reach Vault HTTP listener (8200).
# Apply after Cilium is healthy: kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
#
# Ingress-only policy: egress from Vault is unchanged (Kubernetes auth needs API + DNS).
# Extend ingress rules if other namespaces must call Vault (e.g. app workloads).
#
# Ref: https://docs.cilium.io/en/stable/security/policy/language/
---
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: vault-http-ingress
namespace: vault
spec:
endpointSelector:
matchLabels:
app.kubernetes.io/name: vault
component: server
ingress:
- fromEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": external-secrets
toPorts:
- ports:
- port: "8200"
protocol: TCP
- fromEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": traefik
toPorts:
- ports:
- port: "8200"
protocol: TCP
- fromEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": vault
toPorts:
- ports:
- port: "8200"
protocol: TCP

View File

@@ -1,77 +0,0 @@
#!/usr/bin/env bash
# Configure Vault Kubernetes auth + KV v2 + policy/role for External Secrets Operator.
# Requires: kubectl (cluster access), jq optional (openid issuer); Vault reachable via sts/vault.
#
# Usage (from repo root):
# export KUBECONFIG=talos/kubeconfig # or your path
# export VAULT_TOKEN='…' # root or admin token — never commit
# ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
#
# Then: kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
# Verify: kubectl describe clustersecretstore vault
set -euo pipefail
: "${VAULT_TOKEN:?Set VAULT_TOKEN to your Vault root or admin token}"
ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
sh -ec '
set -e
vault auth list >/tmp/vauth.txt
grep -q "^kubernetes/" /tmp/vauth.txt || vault auth enable kubernetes
'
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
CA_B64="$CA_B64" \
REVIEWER="$REVIEWER" \
ISSUER="$ISSUER" \
sh -ec '
echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
vault write auth/kubernetes/config \
kubernetes_host="https://kubernetes.default.svc:443" \
kubernetes_ca_cert=@/tmp/k8s-ca.crt \
token_reviewer_jwt="$REVIEWER" \
issuer="$ISSUER"
'
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
sh -ec '
set -e
vault secrets list >/tmp/vsec.txt
grep -q "^secret/" /tmp/vsec.txt || vault secrets enable -path=secret kv-v2
'
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
sh -ec '
vault policy write external-secrets - <<EOF
path "secret/data/*" {
capabilities = ["read", "list"]
}
path "secret/metadata/*" {
capabilities = ["read", "list"]
}
EOF
vault write auth/kubernetes/role/external-secrets \
bound_service_account_names=external-secrets \
bound_service_account_namespaces=external-secrets \
policies=external-secrets \
ttl=24h
'
echo "Done. Issuer used: $ISSUER"
echo ""
echo "Next (each command on its own line — do not paste # comments after kubectl):"
echo " kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
echo " kubectl get clustersecretstore vault"

View File

@@ -1,5 +0,0 @@
# HashiCorp Vault — apply before Helm.
apiVersion: v1
kind: Namespace
metadata:
name: vault

View File

@@ -1,63 +0,0 @@
# Optional lab auto-unseal: applies after Vault is initialized and Secret `vault-unseal-key` exists.
#
# 1) vault operator init -key-shares=1 -key-threshold=1 (lab only — single key)
# 2) kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
# 3) kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
#
# OSS Vault has no Kubernetes/KMS seal; this CronJob runs vault operator unseal when the server is sealed.
# Protect the Secret with RBAC; prefer cloud KMS auto-unseal for real environments.
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: vault-auto-unseal
namespace: vault
spec:
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 3
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
securityContext:
runAsNonRoot: true
runAsUser: 100
runAsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: unseal
image: hashicorp/vault:1.21.2
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
env:
- name: VAULT_ADDR
value: http://vault.vault.svc:8200
command:
- /bin/sh
- -ec
- |
test -f /secrets/key || exit 0
status="$(vault status -format=json 2>/dev/null || true)"
echo "$status" | grep -q '"initialized":true' || exit 0
echo "$status" | grep -q '"sealed":false' && exit 0
vault operator unseal "$(cat /secrets/key)"
volumeMounts:
- name: unseal
mountPath: /secrets
readOnly: true
volumes:
- name: unseal
secret:
secretName: vault-unseal-key
optional: true
items:
- key: key
path: key

View File

@@ -1,62 +0,0 @@
# HashiCorp Vault — noble (standalone, file storage on Longhorn; TLS disabled on listener for in-cluster HTTP).
#
# helm repo add hashicorp https://helm.releases.hashicorp.com
# helm repo update
# kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
# helm upgrade --install vault hashicorp/vault -n vault \
# --version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
#
# Post-install: initialize, store unseal key in Secret, apply optional unseal CronJob — see README.md
#
global:
tlsDisable: true
injector:
enabled: true
server:
enabled: true
dataStorage:
enabled: true
size: 10Gi
storageClass: longhorn
accessMode: ReadWriteOnce
ha:
enabled: false
standalone:
enabled: true
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "file" {
path = "/vault/data"
}
# Allow pod Ready before init/unseal so Helm --wait succeeds (see Vault /v1/sys/health docs).
readinessProbe:
enabled: true
path: "/v1/sys/health?uninitcode=204&sealedcode=204&standbyok=true"
port: 8200
# LAN: TLS terminates at Traefik + cert-manager; listener stays HTTP (global.tlsDisable).
ingress:
enabled: true
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: vault.apps.noble.lab.pcenicni.dev
paths: []
tls:
- secretName: vault-apps-noble-tls
hosts:
- vault.apps.noble.lab.pcenicni.dev
ui:
enabled: true

View File

@@ -0,0 +1,118 @@
# Velero (cluster backups)
Ansible-managed core stack — **not** reconciled by Argo CD (`clusters/noble/apps` is optional GitOps only).
## What you get
- **No web UI** — Velero is operated with the **`velero`** CLI and **`kubectl`** (Backup, Schedule, Restore CRDs). Metrics are exposed for Prometheus; there is no first-party dashboard in this chart.
- **vmware-tanzu/velero** Helm chart (**12.0.0** → Velero **1.18.0**) in namespace **`velero`**
- **AWS plugin** init container for **S3-compatible** object storage (`velero/velero-plugin-for-aws:v1.14.0`)
- **CSI snapshots** via Veleros built-in CSI support (`EnableCSI`) and **VolumeSnapshotLocation** `velero.io/csi` (no separate CSI plugin image for Velero ≥ 1.14)
- **Prometheus** scraping: **ServiceMonitor** labeled for **kube-prometheus** (`release: kube-prometheus`)
- **Schedule** **`velero-daily-noble`**: cron **`0 3 * * *`** (daily at 03:00 in the Velero pods timezone, usually **UTC**), **720h** TTL per backup (~30 days). Edit **`values.yaml`** `schedules` to change time or retention.
## Prerequisites
1. **Volume Snapshot APIs** installed cluster-wide — **`clusters/noble/bootstrap/csi-snapshot-controller/`** (Ansible **`noble_csi_snapshot_controller`**, after **Cilium**). Without **`snapshot.storage.k8s.io`** CRDs and **`kube-system/snapshot-controller`**, Velero logs errors like `no matches for kind "VolumeSnapshot"`.
2. **Longhorn** (or another CSI driver) with a **VolumeSnapshotClass** for that driver.
3. For **Longhorn**, this repo applies **`velero/longhorn-volumesnapshotclass.yaml`** (`VolumeSnapshotClass` **`longhorn-velero`**, driver **`driver.longhorn.io`**, Velero label). It is included in **`clusters/noble/bootstrap/kustomization.yaml`** (same apply as other bootstrap YAML). For non-Longhorn drivers, add a class with **`velero.io/csi-volumesnapshot-class: "true"`** (see [Velero CSI](https://velero.io/docs/main/csi/)).
4. **S3-compatible** endpoint (MinIO, VersityGW, AWS, etc.) and a **bucket**.
## Credentials Secret
Velero expects **`velero/velero-cloud-credentials`**, key **`cloud`**, in **INI** form for the AWS plugin:
```ini
[default]
aws_access_key_id=<key>
aws_secret_access_key=<secret>
```
Create manually:
```bash
kubectl -n velero create secret generic velero-cloud-credentials \
--from-literal=cloud="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' "$KEY" "$SECRET")"
```
Or let **Ansible** create it from **`.env`** (`NOBLE_VELERO_AWS_ACCESS_KEY_ID`, `NOBLE_VELERO_AWS_SECRET_ACCESS_KEY`) or from extra-vars **`noble_velero_aws_access_key_id`** / **`noble_velero_aws_secret_access_key`**.
## Apply (Ansible)
1. Copy **`.env.sample`** → **`.env`** at the **repository root** and set at least:
- **`NOBLE_VELERO_S3_BUCKET`** — object bucket name
- **`NOBLE_VELERO_S3_URL`** — S3 API base URL (e.g. `https://minio.lan:9000` or your VersityGW/MinIO endpoint)
- **`NOBLE_VELERO_AWS_ACCESS_KEY_ID`** / **`NOBLE_VELERO_AWS_SECRET_ACCESS_KEY`** — credentials the AWS plugin uses (S3-compatible access key style)
2. Enable the role: set **`noble_velero_install: true`** in **`ansible/group_vars/all.yml`**, **or** pass **`-e noble_velero_install=true`** on the command line.
3. Run from **`ansible/`** (adjust **`KUBECONFIG`** to your cluster admin kubeconfig):
```bash
cd ansible
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
# Velero only (after helm repos; skips other roles unless their tags match — use full playbook if unsure)
ansible-playbook playbooks/noble.yml --tags repos,velero -e noble_velero_install=true
```
If **`NOBLE_VELERO_S3_BUCKET`** / **`NOBLE_VELERO_S3_URL`** are not in **`.env`**, pass them explicitly:
```bash
ansible-playbook playbooks/noble.yml --tags repos,velero -e noble_velero_install=true \
-e noble_velero_s3_bucket=my-bucket \
-e noble_velero_s3_url=https://s3.example.com:9000
```
Full platform run (includes Velero when **`noble_velero_install`** is true in **`group_vars`**):
```bash
ansible-playbook playbooks/noble.yml
```
## Install (Ansible) — details
1. Set **`noble_velero_install: true`** in **`ansible/group_vars/all.yml`** (or pass **`-e noble_velero_install=true`**).
2. Set **`noble_velero_s3_bucket`** and **`noble_velero_s3_url`** via **`.env`** (**`NOBLE_VELERO_S3_*`**) or **`group_vars`** or **`-e`**. Extra-vars override **`.env`**. Optional: **`noble_velero_s3_region`**, **`noble_velero_s3_prefix`**, **`noble_velero_s3_force_path_style`** (defaults match `values.yaml`).
3. Run **`ansible/playbooks/noble.yml`** (Velero runs after **`noble_platform`**).
Example without **`.env`** (all on the CLI):
```bash
cd ansible
ansible-playbook playbooks/noble.yml --tags velero \
-e noble_velero_install=true \
-e noble_velero_s3_bucket=noble-velero \
-e noble_velero_s3_url=https://minio.lan:9000 \
-e noble_velero_aws_access_key_id="$KEY" \
-e noble_velero_aws_secret_access_key="$SECRET"
```
The **`clusters/noble/bootstrap/kustomization.yaml`** applies **`velero/namespace.yaml`** with the rest of the bootstrap namespaces (so **`velero`** exists before Helm).
## Install (Helm only)
From repo root:
```bash
kubectl apply -f clusters/noble/bootstrap/velero/namespace.yaml
# Create velero-cloud-credentials (see above), then:
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts && helm repo update
helm upgrade --install velero vmware-tanzu/velero -n velero --version 12.0.0 \
-f clusters/noble/bootstrap/velero/values.yaml \
--set-string configuration.backupStorageLocation[0].bucket=YOUR_BUCKET \
--set-string configuration.backupStorageLocation[0].config.s3Url=https://YOUR-S3-ENDPOINT \
--wait
```
Edit **`values.yaml`** defaults (bucket placeholder, `s3Url`) or override with **`--set-string`** as above.
## Quick checks
```bash
kubectl -n velero get pods,backupstoragelocation,volumesnapshotlocation
velero backup create test --wait
```
(`velero` CLI: install from [Velero releases](https://github.com/vmware-tanzu/velero/releases).)

View File

@@ -0,0 +1,11 @@
# Default Longhorn VolumeSnapshotClass for Velero CSI — one class per driver may carry
# **velero.io/csi-volumesnapshot-class: "true"** (see velero/README.md).
# Apply after **Longhorn** CSI is running (`driver.longhorn.io`).
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: longhorn-velero
labels:
velero.io/csi-volumesnapshot-class: "true"
driver: driver.longhorn.io
deletionPolicy: Delete

View File

@@ -0,0 +1,5 @@
# Velero — apply before Helm (Ansible **noble_velero**).
apiVersion: v1
kind: Namespace
metadata:
name: velero

View File

@@ -0,0 +1,65 @@
# Velero Helm values — vmware-tanzu/velero chart (see CLUSTER-BUILD.md Phase F).
# Install: **ansible/playbooks/noble.yml** role **noble_velero** (override S3 settings via **noble_velero_*** vars).
# Requires Secret **velero/velero-cloud-credentials** key **cloud** (INI for AWS plugin — see README).
#
# Chart: vmware-tanzu/velero — pin version on install (e.g. 12.0.0 / Velero 1.18.0).
# helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts && helm repo update
# kubectl apply -f clusters/noble/bootstrap/velero/namespace.yaml
# helm upgrade --install velero vmware-tanzu/velero -n velero --version 12.0.0 -f clusters/noble/bootstrap/velero/values.yaml
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.14.0
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
configuration:
features: EnableCSI
defaultBackupStorageLocation: default
defaultVolumeSnapshotLocations: velero.io/csi:default
backupStorageLocation:
- name: default
provider: aws
bucket: noble-velero
default: true
accessMode: ReadWrite
credential:
name: velero-cloud-credentials
key: cloud
config:
region: us-east-1
s3ForcePathStyle: "true"
s3Url: https://s3.CHANGE-ME.invalid
volumeSnapshotLocation:
- name: default
provider: velero.io/csi
config: {}
credentials:
useSecret: true
existingSecret: velero-cloud-credentials
snapshotsEnabled: true
deployNodeAgent: false
metrics:
enabled: true
serviceMonitor:
enabled: true
autodetect: true
additionalLabels:
release: kube-prometheus
# Daily full-cluster backup at 03:00 — cron is evaluated in the Velero pod (typically **UTC**; set TZ on the
# Deployment if you need local wall clock). See `helm upgrade --install` to apply.
schedules:
daily-noble:
disabled: false
schedule: "0 3 * * *"
template:
ttl: 720h
storageLocation: default

View File

@@ -0,0 +1,38 @@
# SOPS-encrypted cluster secrets (noble)
Secrets that belong in git are stored here as **Mozilla SOPS** files encrypted with [age](https://github.com/FiloSottile/age). The matching **private** key lives in **`age-key.txt`** at the repository root (gitignored — create with `age-keygen -o age-key.txt` and add the public key to **`.sops.yaml`** if you rotate keys).
**Migrating from an older cluster** that ran **Vault**, **Sealed Secrets**, or **External Secrets Operator:** uninstall those Helm releases (`helm uninstall vault -n vault`, etc.), delete their namespaces if empty, and export any secrets you still need into plain **`Secret`** YAML here, then encrypt with **`sops`** before committing.
## Prerequisites
- [sops](https://github.com/getsops/sops) and **age** on the machine that encrypts or applies secrets.
## Edit or create a Secret
```bash
export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
# Create a new file from a template, then encrypt:
sops clusters/noble/secrets/example.secret.yaml
# Or edit an existing encrypted file (opens decrypted in $EDITOR):
sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
```
## Apply to the cluster
```bash
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
```
**Ansible** (`noble.yml`) runs the same decrypt-and-apply step for every `*.yaml` in this directory when **`age-key.txt`** exists and **`noble_apply_sops_secrets`** is true (see `ansible/group_vars/all.yml`).
## Files
| File | Purpose |
|------|---------|
| `newt-pangolin-auth.secret.yaml` | Pangolin tunnel credentials for [Newt](../../bootstrap/newt/README.md) (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`). Replace placeholders and re-encrypt before relying on them. |

View File

@@ -0,0 +1,30 @@
apiVersion: ENC[AES256_GCM,data:FaA=,iv:EsqIdZmNS4hfzwCZ0gL7Q5Czaz8Bii3jWFu60lKmgVo=,tag:tfr4yUuTiH4s+ufYW/dpCA==,type:str]
kind: ENC[AES256_GCM,data:ozpTcG9F,iv:Q1EZ896Plhyz2qM4JJRnBf940kbVLSwyIIPUcDGBZFA=,tag:1bWEgI+I4Ni5J70MlohYdA==,type:str]
metadata:
name: ENC[AES256_GCM,data:moXbGuT6ZOGhgVUBNcpHeLZQ,iv:1WDtxT/Et/6lxx1Mj93CQME8o0lhzxnBMkdSqP/n3R0=,tag:v+iqfE8tzCx8ZOMUW7OyEA==,type:str]
namespace: ENC[AES256_GCM,data:33/AMg==,iv:M0GvB/70nHh4MVR1saZy1pGY8IFFzkzGdJl4szHJbCI=,tag:0+1LX/EnkAP0FZ6ARKZNAA==,type:str]
type: ENC[AES256_GCM,data:3io5utU1,iv:QqMDNL/R8SR7TC9mwDdDd3V6VOo+csgeiZCr2AdOZjw=,tag:/KSMy+vNz7Qj/I463eG0LQ==,type:str]
stringData:
PANGOLIN_ENDPOINT: ENC[AES256_GCM,data:a/2QTnGYnNXGxNm8QSVTKC6I+r88J1m1CdMmTA==,iv:L2LvLD7IRX8wdAzALAWQ2ojB2OjWDIcVKrdi/lSvZFY=,tag:ALffRF9bncxA8CExSaRmHA==,type:str]
NEWT_ID: ENC[AES256_GCM,data:Xfe8QvBdX62CciYXYwMfJAzIE/0=,iv:tA+FJ93tsjJ29L3bSxNAEooiKPMc+5pa00EpQ2cJkho=,tag:auiR/zQjnsmyllXbSJf3KA==,type:str]
NEWT_SECRET: ENC[AES256_GCM,data:XY8XZOkZ+GpnjljbvtaH2oGJpDoZ47fN,iv:+J5sb7saqbVwHEyemx3CUSsdKArubRdPCLGbT09sFLM=,tag:zUowv8I1CaWZH+KLYOwKYw==,type:str]
sops:
kms: []
gcp_kms: []
azure_kv: []
hc_vault: []
age:
- recipient: age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn
enc: |
-----BEGIN AGE ENCRYPTED FILE-----
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB0RWppdWxZUEYzc2I2TURi
dm1pUzVaNDA4YldsWkFJODl1MWZ6MXFxWnhjCnVtU1VEQnJqbTI5M0hWM2FCaVlS
aXprTm42bTlldUVHMmxpUUJiWEVhcXcKLS0tIGNLVnNtNDdMQ0VVeDV1N29nOW9F
clhLa2tPdWtRMWYzc2YrR0hSQXczTlUK6hYj4HxQvu6Kqn/Ki+cYv9x5nvolyGqQ
N4g9z+t6orT6MYseWPf0uyovC/5iOOC6z/2exVe7/0rYo7ZOFm6dYQ==
-----END AGE ENCRYPTED FILE-----
lastmodified: "2026-03-29T23:37:33Z"
mac: ENC[AES256_GCM,data:uKtdqJhwE4HLCenHH+RG8O2yfVIcGbiXznL9ouAXhDLnQh/ksgeczr2fyyn9hs/JhCozAqRrF8vnYZsIdfG1DQfHjXn6Ro6gzYC0YR+gvFU8Mz9uPdVX3HYjUrzKJ5GhhBami0USZtLdGKOGgFDYmFoDsD/PmMXLUol8qJdW8Uk=,iv:rIfQI17+3vNBB1n//D7Wnl/SLWFjV0pgZDteumlS2f8=,tag:xibCfJdZQS+aB75drmY1VA==,type:str]
pgp: []
unencrypted_suffix: _unencrypted
version: 3.9.3

View File

@@ -0,0 +1,29 @@
# Eclipse Che (optional — Argo CD)
Three **Application** resources (sync waves **0 → 1 → 2**):
| Wave | Application | Purpose |
|------|-------------|---------|
| 0 | `eclipse-che-devworkspace` | [DevWorkspace operator](https://github.com/devfile/devworkspace-operator) **v0.33.0** (`devworkspace/kustomization.yaml` → remote `combined.yaml`) |
| 1 | `eclipse-che-operator` | [Eclipse Che Helm chart](https://artifacthub.io/packages/helm/eclipse-che/eclipse-che) **7.116.0** (operator in **`eclipse-che`**) |
| 2 | `eclipse-che-cluster` | **`CheCluster`** (`checluster.yaml`) — Traefik + **cert-manager** TLS |
**Prerequisites (cluster):** **cert-manager** + **Traefik** (noble bootstrap). **DNS:** `che.apps.noble.lab.pcenicni.dev` → Traefik LB (edit **`checluster.yaml`** if your domain differs).
**First sync:** Wave ordering applies to **Application** CRs under **noble-root**; if the operator starts before DevWorkspace is ready, **Refresh**/**Sync** the child apps once. See [Eclipse Che on Kubernetes](https://eclipse.dev/che/docs/stable/administration-guide/installing-che-on-kubernetes/).
**URL:** `kubectl get checluster eclipse-che -n eclipse-che -o jsonpath='{.status.cheURL}{"\n"}'` after **Phase** is **Active`.
## Troubleshooting — “no available server” (or similar)
**1. Eclipse Che / dashboard**
- **DevWorkspace routing:** On Kubernetes you **must** set **`routing.clusterHostSuffix`** in **`DevWorkspaceOperatorConfig`** `devworkspace-operator-config` (`devworkspace/dwoc.yaml`). If it was missing, sync **`eclipse-che-devworkspace`** again, then **`eclipse-che-operator`** / **`eclipse-che-cluster`**.
- **Status:** `kubectl get checluster eclipse-che -n eclipse-che -o jsonpath='{.status.chePhase}{"\n"}'` → expect **`Active`**.
- **Pods:** `kubectl get pods -n eclipse-che` — wait for **Running** (Keycloak / gateway / server can take many minutes).
- **Ingress + DNS:** `kubectl get ingress -n eclipse-che` — host **`che.apps.noble.lab.pcenicni.dev`** must resolve to your Traefik LB (same as Grafana/Homepage).
- **TLS:** `kubectl describe certificate -n eclipse-che` (if present) — Lets Encrypt must succeed before the browser trusts the URL.
**2. Argo CD UI / repo**
If the message appears in **Argo CD** (not Che), check in-cluster components: `kubectl get pods -n argocd`, `kubectl logs -n argocd deploy/argocd-repo-server --tail=80`, and that **Applications** use `destination.server: https://kubernetes.default.svc` (in-cluster), not a missing external API.

View File

@@ -0,0 +1,27 @@
# CheCluster CR — sync wave 2 (operator must be Ready to reconcile).
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: eclipse-che-cluster
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "2"
finalizers:
- resources-finalizer.argocd.argoproj.io/background
spec:
project: default
source:
repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
targetRevision: HEAD
path: clusters/noble/apps/eclipse-che
directory:
include: checluster.yaml
destination:
server: https://kubernetes.default.svc
namespace: eclipse-che
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- ServerSideApply=true

View File

@@ -0,0 +1,26 @@
# DevWorkspace operator — must sync before Eclipse Che Helm (sync wave 0).
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: eclipse-che-devworkspace
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "0"
finalizers:
- resources-finalizer.argocd.argoproj.io/background
spec:
project: default
source:
repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
targetRevision: HEAD
path: clusters/noble/apps/eclipse-che/devworkspace
destination:
server: https://kubernetes.default.svc
namespace: devworkspace-controller
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true

View File

@@ -0,0 +1,28 @@
# Eclipse Che operator (Helm) — sync wave 1 (after DevWorkspace CRDs/controller).
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: eclipse-che-operator
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "1"
finalizers:
- resources-finalizer.argocd.argoproj.io/background
spec:
project: default
source:
repoURL: https://eclipse-che.github.io/che-operator/charts
chart: eclipse-che
targetRevision: 7.116.0
helm:
releaseName: eclipse-che
destination:
server: https://kubernetes.default.svc
namespace: eclipse-che
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ServerSideApply=true

View File

@@ -0,0 +1,24 @@
# Eclipse Che instance — applied after **che-operator** is running (sync wave 2).
# Edit **hostname** / **domain** if your ingress DNS differs from the noble lab pattern.
#
# **devEnvironments.networking.externalTLSConfig** — required with cert-manager for **workspace** subdomains.
# Without it, Che creates secure workspace Ingresses with TLS hosts but **no secretName**, so cert-manager
# never issues certs and the dashboard often shows **no available server** when opening a workspace.
apiVersion: org.eclipse.che/v2
kind: CheCluster
metadata:
name: eclipse-che
namespace: eclipse-che
spec:
devEnvironments:
networking:
externalTLSConfig:
enabled: true
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
networking:
domain: apps.noble.lab.pcenicni.dev
hostname: che.apps.noble.lab.pcenicni.dev
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod

View File

@@ -0,0 +1,14 @@
# Required on **Kubernetes** (OpenShift discovers this automatically). See DevWorkspaceOperatorConfig CRD:
# **routing.clusterHostSuffix** — hostname suffix for DevWorkspace routes. Without this, Che / workspaces
# often fail with errors like **no available server** or broken routing.
# Must be named **devworkspace-operator-config** in **devworkspace-controller**.
# v1alpha1 uses a root-level **config** key (not spec.config); see combined.yaml CRD for devworkspaceoperatorconfigs.
# Edit if your ingress base domain differs from the noble lab pattern.
apiVersion: controller.devfile.io/v1alpha1
kind: DevWorkspaceOperatorConfig
metadata:
name: devworkspace-operator-config
namespace: devworkspace-controller
config:
routing:
clusterHostSuffix: apps.noble.lab.pcenicni.dev

View File

@@ -0,0 +1,7 @@
# DevWorkspace operator — prerequisite for Eclipse Che (pinned tag).
# https://github.com/devfile/devworkspace-operator
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- https://raw.githubusercontent.com/devfile/devworkspace-operator/v0.33.0/deploy/deployment/kubernetes/combined.yaml
- dwoc.yaml

169
docs/Racks.md Normal file
View File

@@ -0,0 +1,169 @@
# Physical racks — Noble lab (10")
This page is a **logical rack layout** for the **noble** Talos lab: **three 10" (half-width) racks**, how **rack units (U)** are used, and **Ethernet** paths on **`192.168.50.0/24`**. Node names and IPs match [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) and [`docs/architecture.md`](architecture.md).
## Legend
| Symbol | Meaning |
|--------|---------|
| `█` / filled cell | Equipment occupying that **1U** |
| `░` | Reserved / future use |
| `·` | Empty |
| `━━` | Copper to LAN switch |
**Rack unit numbering:** **U increases upward** (U1 = bottom of rack, like ANSI/EIA). **Slot** in the diagrams is **top → bottom** reading order for a quick visual scan.
### Three racks at a glance
Read **top → bottom** (first row = top of rack).
| Primary (10") | Storage B (10") | Rack C (10") |
|-----------------|-----------------|--------------|
| Fiber ONT | Mac Mini | *empty* |
| UniFi Fiber Gateway | NAS | *empty* |
| Patch panel | JBOD | *empty* |
| 2.5 GbE ×8 PoE switch | *empty* | *empty* |
| Raspberry Pi cluster | *empty* | *empty* |
| **helium** (Talos) | *empty* | *empty* |
| **neon** (Talos) | *empty* | *empty* |
| **argon** (Talos) | *empty* | *empty* |
| **krypton** (Talos) | *empty* | *empty* |
**Connectivity:** Primary rack gear shares **one L2** (`192.168.50.0/24`). Storage B and Rack C link the same way when cabled (e.g. **Ethernet** to the PoE switch, **VPN** or flat LAN per your design).
---
## Rack A — LAN aggregation (10" × 12U)
Dedicated to **Layer-2 access** and cable home runs. All cluster nodes plug into this switch (or into a downstream switch that uplinks here).
```
TOP OF RACK
┌────────────────────────────────────────┐
│ Slot 1 ········· empty ·············· │ 12U
│ Slot 2 ········· empty ·············· │ 11U
│ Slot 3 ········· empty ·············· │ 10U
│ Slot 4 ········· empty ·············· │ 9U
│ Slot 5 ········· empty ·············· │ 8U
│ Slot 6 ········· empty ·············· │ 7U
│ Slot 7 ░░░░░░░ optional PDU ░░░░░░░░ │ 6U
│ Slot 8 █████ 1U cable manager ██████ │ 5U
│ Slot 9 █████ 1U patch panel █████████ │ 4U
│ Slot10 ███ 8-port managed switch ████ │ 3U ← LAN L2 spine
│ Slot11 ········· empty ·············· │ 2U
│ Slot12 ········· empty ·············· │ 1U
└────────────────────────────────────────┘
BOTTOM
```
**Network role:** Every node NIC → **switch access port** → same **VLAN / flat LAN** as documented; **kube-vip** VIP **`192.168.50.230`**, **MetalLB** **`192.168.50.210``229`**, **Traefik** **`192.168.50.211`** are **logical** on node IPs (no extra hardware).
---
## Rack B — Control planes (10" × 12U)
Three **Talos control-plane** nodes (**scheduling allowed** on CPs per `talconfig.yaml`).
```
TOP OF RACK
┌────────────────────────────────────────┐
│ Slot 1 ········· empty ·············· │ 12U
│ Slot 2 ········· empty ·············· │ 11U
│ Slot 3 ········· empty ·············· │ 10U
│ Slot 4 ········· empty ·············· │ 9U
│ Slot 5 ········· empty ·············· │ 8U
│ Slot 6 ········· empty ·············· │ 7U
│ Slot 7 ········· empty ·············· │ 6U
│ Slot 8 █ neon control-plane .20 ████ │ 5U
│ Slot 9 █ argon control-plane .30 ███ │ 4U
│ Slot10 █ krypton control-plane .40 ██ │ 3U (kube-vip VIP .230)
│ Slot11 ········· empty ·············· │ 2U
│ Slot12 ········· empty ·············· │ 1U
└────────────────────────────────────────┘
BOTTOM
```
---
## Rack C — Worker (10" × 12U)
Single **worker** node; **Longhorn** data disk is **local** to each node (see `talconfig.yaml`); no separate NAS in this diagram.
```
TOP OF RACK
┌────────────────────────────────────────┐
│ Slot 1 ········· empty ·············· │ 12U
│ Slot 2 ········· empty ·············· │ 11U
│ Slot 3 ········· empty ·············· │ 10U
│ Slot 4 ········· empty ·············· │ 9U
│ Slot 5 ········· empty ·············· │ 8U
│ Slot 6 ········· empty ·············· │ 7U
│ Slot 7 ░░░░░░░ spare / future ░░░░░░░░ │ 6U
│ Slot 8 ········· empty ·············· │ 5U
│ Slot 9 ········· empty ·············· │ 4U
│ Slot10 ███ helium worker .10 █████ │ 3U
│ Slot11 ········· empty ·············· │ 2U
│ Slot12 ········· empty ·············· │ 1U
└────────────────────────────────────────┘
BOTTOM
```
---
## Space summary
| System | Rack | Approx. U | IP | Role |
|--------|------|-----------|-----|------|
| LAN switch | A | 1U | — | All nodes on `192.168.50.0/24` |
| Patch / cable mgmt | A | 2× 1U | — | Physical plant |
| **neon** | B | 1U | `192.168.50.20` | control-plane + schedulable |
| **argon** | B | 1U | `192.168.50.30` | control-plane + schedulable |
| **krypton** | B | 1U | `192.168.50.40` | control-plane + schedulable |
| **helium** | C | 1U | `192.168.50.10` | worker |
Adjust **empty vs. future** rows if your chassis are **2U** or on **shelves** — scale the `█` blocks accordingly.
---
## Network connections
All cluster nodes are on **one flat LAN**. **kube-vip** floats **`192.168.50.230:6443`** across the three control-plane hosts on **`ens18`** (see cluster bootstrap docs).
```mermaid
flowchart TB
subgraph RACK_A["Rack A — 10\""]
SW["Managed switch<br/>192.168.50.0/24 L2"]
PP["Patch / cable mgmt"]
SW --- PP
end
subgraph RACK_B["Rack B — 10\""]
N["neon :20"]
A["argon :30"]
K["krypton :40"]
end
subgraph RACK_C["Rack C — 10\""]
H["helium :10"]
end
subgraph LOGICAL["Logical (any node holding VIP)"]
VIP["API VIP 192.168.50.230<br/>kube-vip → apiserver :6443"]
end
WAN["Internet / other LANs"] -.->|"router (out of scope)"| SW
SW <-->|"Ethernet"| N
SW <-->|"Ethernet"| A
SW <-->|"Ethernet"| K
SW <-->|"Ethernet"| H
N --- VIP
A --- VIP
K --- VIP
WK["Workstation / CI<br/>kubectl, browser"] -->|"HTTPS :6443"| VIP
WK -->|"L2 (MetalLB .210.211, any node)"| SW
```
**Ingress path (same LAN):** clients → **`192.168.50.211`** (Traefik) or **`192.168.50.210`** (Argo CD) via **MetalLB** — still **through the same switch** to whichever node advertises the service.
---
## Related docs
- Cluster topology and services: [`architecture.md`](architecture.md)
- Build state and versions: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)

View File

@@ -8,8 +8,8 @@ This document describes the **noble** Talos lab cluster: node topology, networki
|---------------|---------|
| **Subgraph “Cluster”** | Kubernetes cluster boundary (`noble`) |
| **External / DNS / cloud** | Services outside the data plane (internet, registrar, Pangolin) |
| **Data store** | Durable data (etcd, Longhorn, Loki, Vault storage) |
| **Secrets / policy** | Secret material, Vault, admission policy |
| **Data store** | Durable data (etcd, Longhorn, Loki) |
| **Secrets / policy** | Secret material (SOPS in git), admission policy |
| **LB / VIP** | Load balancer, MetalLB assignment, or API VIP |
---
@@ -74,7 +74,7 @@ flowchart TB
## Platform stack (bootstrap → workloads)
Order: **Talos****Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip****Traefik**, **cert-manager****Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `vault`, `external-secrets`, `sealed-secrets`, `kyverno`, `newt`, and others as deployed.
Order: **Talos****Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip****Traefik**, **cert-manager****Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `kyverno`, `newt`, and others as deployed.
```mermaid
flowchart TB
@@ -98,7 +98,7 @@ flowchart TB
Argo["Argo CD<br/>(optional app-of-apps; platform via Ansible)"]
end
subgraph L5["Platform namespaces (examples)"]
NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, vault, external-secrets, sealed-secrets,<br/>kyverno, newt, …"]
NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, kyverno, newt, …"]
end
Talos --> Cilium --> MS
Cilium --> LH
@@ -149,22 +149,20 @@ flowchart LR
## Secrets and policy
**Sealed Secrets** decrypts `SealedSecret` objects in-cluster. **External Secrets Operator** syncs from **Vault** using **`ClusterSecretStore`** (see [`examples/vault-cluster-secret-store.yaml`](../clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml)). Trust is **cluster → Vault** (ESO calls Vault; Vault does not initiate cluster trust). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
**Mozilla SOPS** with **age** encrypts plain Kubernetes **`Secret`** manifests under [`clusters/noble/secrets/`](../clusters/noble/secrets/); operators decrypt at apply time (`ansible/playbooks/noble.yml` or `sops -d … | kubectl apply`). The private key is **`age-key.txt`** at the repo root (gitignored). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
```mermaid
flowchart LR
subgraph Git["Git repo"]
SSman["SealedSecret manifests<br/>(optional)"]
SM["SOPS-encrypted Secret YAML<br/>clusters/noble/secrets/"]
end
subgraph ops["Apply path"]
SOPS["sops -d + kubectl apply<br/>(or Ansible noble.yml)"]
end
subgraph cluster["Cluster"]
SSC["Sealed Secrets controller<br/>sealed-secrets"]
ESO["External Secrets Operator<br/>external-secrets"]
V["Vault<br/>vault namespace<br/>HTTP listener"]
K["Kyverno + kyverno-policies<br/>PSS baseline Audit"]
end
SSman -->|"encrypted"| SSC -->|"decrypt to Secret"| workloads["Workload Secrets"]
ESO -->|"ClusterSecretStore →"| V
ESO -->|"sync ExternalSecret"| workloads
SM --> SOPS -->|"plain Secret"| workloads["Workload Secrets"]
K -.->|"admission / audit<br/>(PSS baseline)"| workloads
```
@@ -172,7 +170,7 @@ flowchart LR
## Data and storage
**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **Vault**, **kube-prometheus-stack** PVCs, and **Loki**.
**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **kube-prometheus-stack** PVCs and **Loki**.
```mermaid
flowchart TB
@@ -183,12 +181,10 @@ flowchart TB
SC["StorageClass: longhorn (default)"]
end
subgraph consumers["Stateful / durable consumers"]
V["Vault PVC data-vault-0"]
PGL["kube-prometheus-stack PVCs"]
L["Loki PVC"]
end
UD --> SC
SC --> V
SC --> PGL
SC --> L
```
@@ -210,7 +206,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
| Argo CD | 9.4.17 / app v3.3.6 |
| kube-prometheus-stack | 82.15.1 |
| Loki / Fluent Bit | 6.55.0 / 0.56.0 |
| Sealed Secrets / ESO / Vault | 2.18.4 / 2.2.0 / 0.32.0 |
| SOPS (client tooling) | see `clusters/noble/secrets/README.md` |
| Kyverno | 3.7.1 / policies 3.7.1 |
| Newt | 1.2.0 / app 1.10.1 |
@@ -218,7 +214,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
## Narrative
The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210``192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, Loki, and **Vault**. **Secrets** combine **Sealed Secrets** for git-encrypted material, **Vault** with **External Secrets** for dynamic sync, and **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210``192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible** for the **initial** platform install (**`clusters/noble/bootstrap/`**), then **Argo CD** for the kustomize tree (**`noble-bootstrap-root`** → **`clusters/noble/bootstrap`**) and optional apps (**`noble-root`** → **`clusters/noble/apps/`**) once automated sync is enabled after **`noble.yml`** (see **`clusters/noble/bootstrap/argocd/README.md`** §5). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, and Loki. **Secrets** in git use **SOPS** + **age** under **`clusters/noble/secrets/`**; **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
---
@@ -233,7 +229,7 @@ The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** wi
**Open questions**
- **Split horizon:** Confirm whether only LAN DNS resolves `*.apps.noble.lab.pcenicni.dev` to **`192.168.50.211`** or whether public resolvers also point at that address.
- **Velero / S3:** **TBD** until an S3-compatible backend is configured.
- **Velero / S3:** optional **Ansible** install (**`noble_velero_install`**) from **`clusters/noble/bootstrap/velero/`** once an S3-compatible backend and credentials exist (see **`talos/CLUSTER-BUILD.md`** Phase F).
- **Argo CD:** Confirm **`repoURL`** in `root-application.yaml` and what is actually applied on-cluster.
---

100
docs/homelab-network.md Normal file
View File

@@ -0,0 +1,100 @@
# Homelab network inventory
Single place for **VLANs**, **static addressing**, and **hosts** beside the **noble** Talos cluster. **Proxmox** is the **hypervisor** for the VMs below; **all of those VMs are intended to run on `192.168.1.0/24`** (same broadcast domain as Pi-hole and typical home clients). **Noble** (Talos) stays on **`192.168.50.0/24`** per [`architecture.md`](architecture.md) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) until you change that design.
## VLANs (logical)
| Network | Role |
|---------|------|
| **`192.168.1.0/24`** | **Homelab / Proxmox LAN****Proxmox host(s)**, **all Proxmox VMs**, **Pi-hole**, **Mac mini**, and other servers that share this VLAN. |
| **`192.168.50.0/24`** | **Noble Talos** cluster — physical nodes, **kube-vip**, **MetalLB**, Traefik; **not** the Proxmox VM subnet. |
| **`192.168.60.0/24`** | **DMZ / WAN-facing****NPM**, **WebDAV**, **other services** that need WAN access. |
| **`192.168.40.0/24`** | **Home Assistant** and IoT devices — isolated; record subnet and HA IP in DHCP/router. |
**Routing / DNS:** Clients and VMs on **`192.168.1.0/24`** reach **noble** services on **`192.168.50.0/24`** via **L3** (router/firewall). **NFS** from OMV (`192.168.1.105`) to **noble** pods uses the **OMV data IP** as the NFS server address from the clusters perspective.
Firewall rules between VLANs are **out of scope** here; document them where you keep runbooks.
---
## `192.168.50.0/24` — reservations (noble only)
Do not assign **unrelated** static services on **this** VLAN without checking overlap with MetalLB and kube-vip.
| Use | Addresses |
|-----|-----------|
| Talos nodes | `.10``.40` (see [`talos/talconfig.yaml`](../talos/talconfig.yaml)) |
| MetalLB L2 pool | `.210``.229` |
| Traefik (ingress) | `.211` (typical) |
| Argo CD | `.210` (typical) |
| Kubernetes API (kube-vip) | **`.230`** — **must not** be a VM |
---
## Proxmox VMs (`192.168.1.0/24`)
All run on **Proxmox**; addresses below use **`192.168.1.0/24`** (same host octet as your earlier `.50.x` / `.60.x` plan, moved into the homelab VLAN). Adjust if your router uses a different numbering scheme.
Most are **Docker hosts** with multiple apps; treat the **IP** as the **host**, not individual containers.
| VM ID | Name | IP | Notes |
|-------|------|-----|--------|
| 666 | nginxproxymanager | `192.168.1.20` | NPM (edge / WAN-facing role — firewall as you design). |
| 777 | nginxproxymanager-Lan | `192.168.1.60` | NPM on **internal** homelab LAN. |
| 100 | Openmediavault | `192.168.1.105` | **NFS** exports for *arr / media paths. |
| 110 | Monitor | `192.168.1.110` | Uptime Kuma, Peekaping, Tracearr → cluster candidates. |
| 120 | arr | `192.168.1.120` | *arr stack; media via **NFS** from OMV — see [migration](#arr-stack-nfs-and-kubernetes). |
| 130 | Automate | `192.168.1.130` | Low use — **candidate to remove** or consolidate. |
| 140 | general-purpose | `192.168.1.140` | IT tools, Mealie, Open WebUI, SparkyFitness, … |
| 150 | Media-server | `192.168.1.150` | Jellyfin (test, **NFS** media), ebook server. |
| 160 | s3 | `192.168.1.170` | Object storage; **merge** into **central S3** on noble per [`shared-data-services.md`](shared-data-services.md) when ready. |
| 190 | Auth | `192.168.1.190` | **Authentik****noble (K8s)** for HA. |
| 300 | gitea | `192.168.1.203` | On **`.1`**, no overlap with noble **MetalLB `.210``.229`** on **`.50`**. |
| 310 | gitea-nsfw | `192.168.1.204` | |
| 500 | AMP | `192.168.1.47` | |
### Workload detail (what runs where)
**Auth (190)****Authentik** is the main service; moving it to **Kubernetes (noble)** gives you **HA**, rolling upgrades, and backups via your cluster patterns (PVCs, Velero, etc.). Plan **OIDC redirect URLs** and **outposts** (if used) when the **ingress hostname** and paths to **`.50`** services change.
**Monitor (110)****Uptime Kuma**, **Peekaping**, and **Tracearr** are a good fit for the cluster: small state (SQLite or small DBs), **Ingress** via Traefik, and **Longhorn** or a small DB PVC. Migrate **one app at a time** and keep the old VM until DNS and alerts are verified.
**arr (120)****Lidarr, Sonarr, Radarr**, and related *arr* apps; libraries and download paths point at **NFS** from **Openmediavault (100)** at **`192.168.1.105`**. The hard part is **keeping paths, permissions (UID/GID), and download client** wiring while pods move.
**Automate (130)** — Tools are **barely used**; **decommission**, merge into **general-purpose (140)**, or replace with a **CronJob** / one-shot on the cluster only if something still needs scheduling.
**general-purpose (140)** — “Daily driver” stack: **IT tools**, **Mealie**, **Open WebUI**, **SparkyFitness**, and similar. **Candidates for gradual moves** to noble; group by **data sensitivity** and **persistence** (Postgres vs SQLite) when you pick order.
**Media-server (150)****Jellyfin** (testing) with libraries on **NFS**; **ebook** server. Treat **Jellyfin** like *arr* for storage: same NFS export and **transcoding** needs (CPU on worker nodes or GPU if you add it). Ebook stack depends on what you run (e.g. Kavita, Audiobookshelf) — note **metadata paths** before moving.
### Arr stack, NFS, and Kubernetes
You do **not** have to move NFS into the cluster: **Openmediavault** on **`192.168.1.105`** can stay the **NFS server** while the *arr* apps run as **Deployments** with **ReadWriteMany** volumes. Noble nodes on **`192.168.50.0/24`** mount NFS using **that IP** (ensure **firewall** allows **NFS** from node IPs to OMV).
1. **Keep OMV as the single source of exports** — same **export path** (e.g. `/export/media`) from the clusters perspective as from the current VM.
2. **Mount NFS in Kubernetes** — use a **CSI NFS driver** (e.g. **nfs-subdir-external-provisioner** or **csi-driver-nfs**) so each app gets a **PVC** backed by a **subdirectory** of the export, **or** one shared RWX PVC for a common tree if your layout needs it.
3. **Match POSIX ownership** — set **supplemental groups** or **fsGroup** / **runAsUser** on the pods so Sonarr/Radarr see the same **UID/GID** as todays Docker setup; fix **squash** settings on OMV if you use `root_squash`.
4. **Config and DB** — back up each apps **config volume** (or SQLite files), redeploy with the same **environment**; point **download clients** and **NFS media roots** to the **same logical paths** inside the container.
5. **Low-risk path** — run **one** *arr* app on the cluster while the rest stay on **VM 120** until imports and downloads behave; then cut DNS/NPM streams over.
If you prefer **no** NFS from pods, the alternative is **large ReadWriteOnce** disks on Longhorn and **sync** from OMV — usually **more** moving parts than **RWX NFS** for this workload class.
---
## Other hosts
| Host | IP | VLAN / network | Notes |
|------|-----|----------------|--------|
| **Pi-hole** | `192.168.1.127` | `192.168.1.0/24` | DNS; same VLAN as Proxmox VMs. |
| **Home Assistant** | *TBD* | **IoT VLAN** | Add reservation when fixed. |
| **Mac mini** | `192.168.1.155` | `192.168.1.0/24` | Align with **Storage B** in [`Racks.md`](Racks.md) if the same machine. |
---
## Related docs
- **Shared Postgres + S3 (centralized):** [`shared-data-services.md`](shared-data-services.md)
- **VM → noble migration plan:** [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
- Noble cluster topology and ingress: [`architecture.md`](architecture.md)
- Physical racks (Primary / Storage B / Rack C): [`Racks.md`](Racks.md)
- Cluster checklist: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)

View File

@@ -0,0 +1,121 @@
# Migration plan: Proxmox VMs → noble (Kubernetes)
This document is the **default playbook** for moving workloads from **Proxmox VMs** on **`192.168.1.0/24`** into the **noble** Talos cluster on **`192.168.50.0/24`**. Source inventory and per-VM notes: [`homelab-network.md`](homelab-network.md). Cluster facts: [`architecture.md`](architecture.md), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md).
---
## 1. Scope and principles
| Principle | Detail |
|-----------|--------|
| **One service at a time** | Run the new workload on **noble** while the **VM** stays up; cut over **DNS / NPM** only after checks pass. |
| **Same container image** | Prefer the **same** upstream image and major version as Docker on the VM to reduce surprises. |
| **Data moves with a plan** | **Backup** VM volumes or export DB dumps **before** the first deploy to the cluster. |
| **Ingress on noble** | Internal apps use **Traefik** + **`*.apps.noble.lab.pcenicni.dev`** (or your chosen hostnames) and **MetalLB** (e.g. **`192.168.50.211`**) per [`architecture.md`](architecture.md). |
| **Cross-VLAN** | Clients on **`.1`** reach services on **`.50`** via **routing**; **firewall** must allow **NFS** from **Talos node IPs** to **OMV `192.168.1.105`** when pods mount NFS. |
**Not everything must move.** Keep **Openmediavault** (and optionally **NPM**) on VMs if you prefer; the cluster consumes **NFS** and **HTTP** from them.
---
## 2. Prerequisites (before wave 1)
1. **Cluster healthy**`kubectl get nodes`; [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) checklist through ingress and cert-manager as needed.
2. **Ingress + TLS****Traefik** + **cert-manager** working; you can hit a **test Ingress** on the MetalLB IP.
3. **GitOps / deploy path** — Decide per app: **Helm** under `clusters/noble/apps/`, **Argo CD**, or **Ansible**-applied manifests (match how you manage the rest of noble).
4. **Secrets** — Plan **Kubernetes Secrets**; for git-stored material, align with **SOPS** (`clusters/noble/secrets/`, `.sops.yaml`).
5. **Storage****Longhorn** default for **ReadWriteOnce** state; for **NFS** (*arr*, Jellyfin), install a **CSI NFS** driver and test a **small RWX PVC** before migrating data-heavy apps.
6. **Shared data tier (recommended)** — Deploy **centralized PostgreSQL** and **S3-compatible storage** on noble so apps do not each ship their own DB/object store; see [`shared-data-services.md`](shared-data-services.md).
7. **Firewall** — Rules: **workstation → `192.168.50.230:6443`**; **nodes → OMV NFS ports**; **clients → `192.168.50.211`** (or split-horizon DNS) as you design.
8. **DNS** — Split-horizon or Pi-hole records for **`*.apps.noble.lab.pcenicni.dev`** → **Traefik** IP **`192.168.50.211`** for LAN clients.
---
## 3. Standard migration procedure (repeat per app)
Use this checklist for **each** application (or small group, e.g. one Helm release).
| Step | Action |
|------|--------|
| **A. Discover** | Document **image:tag**, **ports**, **volumes** (host paths), **env vars**, **depends_on** (DB, Redis, NFS path). Export **docker inspect** / **compose** from the VM. |
| **B. Backup** | Snapshot **Proxmox VM** or backup **volume** / **SQLite** / **DB dump** to offline storage. |
| **C. Namespace** | Create a **dedicated namespace** (e.g. `monitoring-tools`, `authentik`) or use your house standard. |
| **D. Deploy** | Add **Deployment** (or **StatefulSet**), **Service**, **Ingress** (class **traefik**), **PVCs**; wire **secrets** from **Secrets** (not literals in git). |
| **E. Storage** | **Longhorn** PVC for local state; **NFS CSI** PVC for shared media/config paths that must match the VM (see [`homelab-network.md`](homelab-network.md) *arr* section). Prefer **shared Postgres** / **shared S3** per [`shared-data-services.md`](shared-data-services.md) instead of new embedded databases. Match **UID/GID** with `securityContext`. |
| **F. Smoke test** | `kubectl port-forward` or temporary **Ingress** hostname; log in, run one critical workflow (login, playback, sync). |
| **G. DNS cutover** | Point **internal DNS** or **NPM** upstream from the **VM IP** to the **new hostname** (Traefik) or **MetalLB IP** + Host header. |
| **H. Observe** | 2472 hours: logs, alerts, **Uptime Kuma** (once migrated), backups. |
| **I. Decommission** | Stop the **container** on the VM (not the whole VM until the **whole** VM is empty). |
| **J. VM off** | When **no** services remain on that VM, **power off** and archive or delete the VM. |
**Rollback:** Re-enable the VM service, revert **DNS/NPM** to the old IP, delete or scale the cluster deployment to zero.
---
## 4. Recommended migration order (phases)
Order balances **risk**, **dependencies**, and **learning curve**.
| Phase | Target | Rationale |
|-------|--------|-----------|
| **0 — Optional** | **Automate (130)** | Low use: **retire** or replace with **CronJobs**; skip if nothing valuable runs. |
| **0b — Platform** | **Shared Postgres + S3** on noble | Run **before** or alongside early waves so new deploys use **one DSN** and **one object endpoint**; retire **VM 160** when empty. See [`shared-data-services.md`](shared-data-services.md). |
| **1 — Observability** | **Monitor (110)** — Uptime Kuma, Peekaping, Tracearr | Small state, validates **Ingress**, **PVCs**, and **alert paths** before auth and media. |
| **2 — Git** | **gitea (300)**, **gitea-nsfw (310)** | Point at **shared Postgres** + **S3** for attachments; move **repos** with **PVC** + backup restore if needed. |
| **3 — Object / misc** | **s3 (160)**, **AMP (500)** | **Migrate data** into **central** S3 on cluster, then **decommission** duplicate MinIO on VM **160** if applicable. |
| **4 — Auth** | **Auth (190)****Authentik** | Use **shared Postgres**; update **all OIDC clients** (Gitea, apps, NPM) with **new issuer URLs**; schedule a **maintenance window**. |
| **5 — Daily apps** | **general-purpose (140)** | Move **one app per release** (Mealie, Open WebUI, …); each app gets its **own database** (and bucket if needed) on the **shared** tiers — not a new Postgres pod per app. |
| **6 — Media / *arr*** | **arr (120)**, **Media-server (150)** | **NFS** from **OMV**, download clients, **transcoding** — migrate **one *arr*** then Jellyfin/ebook; see NFS bullets in [`homelab-network.md`](homelab-network.md). |
| **7 — Edge** | **NPM (666/777)** | Often **last**: either keep on Proxmox or replace with **Traefik** + **IngressRoutes** / **Gateway API**; many people keep a **dedicated** reverse proxy VM until parity is proven. |
**Openmediavault (100)** — Typically **stays** as **NFS** (and maybe backup target) for the cluster; no need to “migrate” the whole NAS into Kubernetes.
---
## 5. Ingress and reverse proxy
| Approach | When to use |
|----------|-------------|
| **Traefik Ingress** on noble | Default for **internal** HTTPS apps; **cert-manager** for public names you control. |
| **NPM (VM)** as front door | Point **proxy host****Traefik MetalLB IP** or **service name** if you add internal DNS; reduces double-proxy if you **terminate TLS** in one place only. |
| **Newt / Pangolin** | Public reachability per [`clusters/noble/bootstrap/newt/README.md`](../clusters/noble/bootstrap/newt/README.md); not automatic ExternalDNS. |
Avoid **two** TLS terminations for the same hostname unless you intend **SSL passthrough** end-to-end.
---
## 6. Authentik-specific (Auth VM → cluster)
1. **Backup** Authentik **PostgreSQL** (or embedded DB) and **media** volume from the VM.
2. Deploy **Helm** (official chart) with **same** Authentik version if possible.
3. **Restore** DB into **shared cluster Postgres** (recommended) or chart-managed DB — see [`shared-data-services.md`](shared-data-services.md).
4. Update **issuer URL** in every **OIDC/OAuth** client (Gitea, Grafana, etc.).
5. Re-test **outposts** (if any) and **redirect URIs** from both **`.1`** and **`.50`** client perspectives.
6. **Cut over DNS**; then **decommission** VM **190**.
---
## 7. *arr* and Jellyfin-specific
Follow the **numbered list** under **“Arr stack, NFS, and Kubernetes”** in [`homelab-network.md`](homelab-network.md). In short: **OMV stays**; **CSI NFS** + **RWX**; **match permissions**; migrate **one app** first; verify **download client** can reach the new pod **IP/DNS** from your download host.
---
## 8. Validation checklist (per wave)
- Pods **Ready**, **Ingress** returns **200** / login page.
- **TLS** valid for chosen hostname.
- **Persistent data** present (new uploads, DB writes survive pod restart).
- **Backups** (Velero or app-level) defined for the new location.
- **Monitoring** / alerts updated (targets, not old VM IP).
- **Documentation** in [`homelab-network.md`](homelab-network.md) updated (VM retired or marked migrated).
---
## Related docs
- **Shared Postgres + S3:** [`shared-data-services.md`](shared-data-services.md)
- VM inventory and NFS notes: [`homelab-network.md`](homelab-network.md)
- Noble topology, MetalLB, Traefik: [`architecture.md`](architecture.md)
- Bootstrap and versions: [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
- Apps layout: [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md)

View File

@@ -0,0 +1,90 @@
# Centralized PostgreSQL and S3-compatible storage
Goal: **one shared PostgreSQL** and **one S3-compatible object store** on **noble**, instead of every app bundling its own database or MinIO. Apps keep **logical isolation** via **per-app databases** / **users** and **per-app buckets** (or prefixes), not separate clusters.
See also: [`migration-vm-to-noble.md`](migration-vm-to-noble.md), [`homelab-network.md`](homelab-network.md) (VM **160** `s3` today), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) (Velero + S3).
---
## 1. Why centralize
| Benefit | Detail |
|--------|--------|
| **Operations** | One backup/restore story, one upgrade cadence, one place to tune **IOPS** and **retention**. |
| **Security** | **Least privilege**: each app gets its own **DB user** and **S3 credentials** scoped to one database or bucket. |
| **Resources** | Fewer duplicate **Postgres** or **MinIO** sidecars; better use of **Longhorn** or dedicated PVCs for the shared tiers. |
**Tradeoff:** Shared tiers are **blast-radius** targets — use **backups**, **PITR** where you care, and **NetworkPolicies** so only expected namespaces talk to Postgres/S3.
---
## 2. PostgreSQL — recommended pattern
1. **Run Postgres on noble** — Operators such as **CloudNativePG**, **Zalando Postgres operator**, or a well-maintained **Helm** chart with **replicas** + **persistent volumes** (Longhorn).
2. **One cluster instance, many databases** — For each app: `CREATE DATABASE appname;` and a **dedicated role** with `CONNECT` on that database only (not superuser).
3. **Connection from apps** — Use a **Kubernetes Service** (e.g. `postgres-platform.platform.svc.cluster.local:5432`) and pass **credentials** via **Secrets** (ideally **SOPS**-encrypted in git).
4. **Migrations** — Run app **migration** jobs or init containers against the **same** DSN after DB exists.
**Migrating off SQLite / embedded Postgres**
- **SQLite → Postgres:** export/import per app (native tools, or **pgloader** where appropriate).
- **Docker Postgres volume:** `pg_dumpall` or per-DB `pg_dump` → restore into a **new** database on the shared server; **freeze writes** during cutover.
---
## 3. S3-compatible object storage — recommended pattern
1. **Run one S3 API on noble****MinIO** (common), **Garage**, or **SeaweedFS** S3 layer — with **PVC(s)** or host path for data; **erasure coding** / replicas if the chart supports it and you want durability across nodes.
2. **Buckets per concern** — e.g. `gitea-attachments`, `velero`, `loki-archive` — not one global bucket unless you enforce **prefix** IAM policies.
3. **Credentials****IAM-style** users limited to **one bucket** (or prefix); **Secrets** reference **access key** / **secret**; never commit keys in plain text.
4. **Endpoint for pods** — In-cluster: `http://minio.platform.svc.cluster.local:9000` (or TLS inside mesh). Apps use **virtual-hosted** or **path-style** per SDK defaults.
### NFS as backing store for S3 on noble
**Yes.** You can run MinIO (or another S3-compatible server) with its **data directory** on a **ReadWriteMany** volume that is **NFS** — for example the same **Openmediavault** export you already use, mounted via your **NFS CSI** driver (see [`homelab-network.md`](homelab-network.md)).
| Consideration | Detail |
|---------------|--------|
| **Works for homelab** | MinIO stores objects as files under a path; **POSIX** on NFS is enough for many setups. |
| **Performance** | NFS adds **latency** and shared bandwidth; fine for moderate use, less ideal for heavy multi-tenant throughput. |
| **Availability** | The **NFS server** (OMV) becomes part of the availability story for object data — plan **backups** and **OMV** health like any dependency. |
| **Locking / semantics** | Prefer **NFSv4.x**; avoid mixing **NFS** and expectations of **local SSD** (e.g. very chatty small writes). If you see odd behavior, **Longhorn** (block) on a node is the usual next step. |
| **Layering** | You are stacking **S3 API → file layout → NFS → disk**; that is normal for a lab, just **monitor** space and exports on OMV. |
**Summary:** NFS-backed PVC for MinIO is **valid** on noble; use **Longhorn** (or local disk) when you need **better IOPS** or want object data **inside** the clusters storage domain without depending on OMV for that tier.
**Migrating off VM 160 (`s3`) or per-app MinIO**
- **MinIO → MinIO:** `mc mirror` between aliases, or **replication** if you configure it.
- **Same API:** Any tool speaking **S3** can **sync** buckets before you point apps at the new endpoint.
**Velero** — Point the **backup location** at the **central** bucket (see cluster Velero docs); avoid a second ad-hoc object store for backups if one cluster bucket is enough.
---
## 4. Ordering relative to app migrations
| When | What |
|------|------|
| **Early** | Stand up **Postgres** + **S3** with **empty** DBs/buckets; test with **one** non-critical app (e.g. a throwaway deployment). |
| **Before auth / Git** | **Gitea** and **Authentik** benefit from **managed Postgres** early — plan **DSN** and **bucket** for attachments **before** cutover. |
| **Ongoing** | New apps **must not** ship embedded **Postgres/MinIO** unless the workload truly requires it (e.g. vendor appliance). |
---
## 5. Checklist (platform team)
- [ ] Postgres **Service** DNS name and **TLS** (optional in-cluster) documented.
- [ ] S3 **endpoint**, **region** string (can be `us-east-1` for MinIO), **TLS** for Ingress if clients are outside the cluster.
- [ ] **Backup:** scheduled **logical dumps** (Postgres) and **bucket replication** or **object versioning** where needed.
- [ ] **SOPS** / **External Secrets** pattern for **rotation** without editing app manifests by hand.
- [ ] **homelab-network.md** updated when **VM 160** is retired or repurposed.
---
## Related docs
- VM → cluster migration: [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
- Inventory (s3 VM): [`homelab-network.md`](homelab-network.md)
- Longhorn / storage runbook: [`../talos/runbooks/longhorn.md`](../talos/runbooks/longhorn.md)
- Velero (S3 backup target): [`../clusters/noble/bootstrap/velero/`](../clusters/noble/bootstrap/velero/) (if present)

View File

@@ -7,16 +7,17 @@
services:
tracearr:
image: ghcr.io/connorgallopo/tracearr:supervised-nightly
image: ghcr.io/connorgallopo/tracearr:supervised
shm_size: 256mb # Required for PostgreSQL shared memory
ports:
- "${PORT:-3000}:3000"
environment:
- NODE_ENV=production
- PORT=3000
- HOST=0.0.0.0
- TZ=${TZ:-UTC}
- CORS_ORIGIN=${CORS_ORIGIN:-*}
- LOG_LEVEL=${LOG_LEVEL:-info}
# Optional: Override auto-generated secrets
# - JWT_SECRET=${JWT_SECRET}
# - COOKIE_SECRET=${COOKIE_SECRET}
volumes:
- tracearr_postgres:/data/postgres
- tracearr_redis:/data/redis

View File

@@ -0,0 +1,37 @@
# Versity S3 Gateway — root credentials for the flat-file IAM backend.
# https://github.com/versity/versitygw/wiki/Quickstart
#
# Local: copy to `.env` next to compose.yaml (or set `run_directory` to this folder
# in Komodo) so `docker compose` can interpolate `${ROOT_ACCESS_KEY}` etc.
#
# Komodo: Stack Environment is written to `<run_directory>/.env` and passed as
# `--env-file` — that drives `${VAR}` in compose.yaml. Set **one** pair using exact
# names (leave the other pair unset / empty):
# ROOT_ACCESS_KEY + ROOT_SECRET_KEY
# ROOT_ACCESS_KEY_ID + ROOT_SECRET_ACCESS_KEY (Helm-style)
ROOT_ACCESS_KEY=
ROOT_SECRET_KEY=
# ROOT_ACCESS_KEY_ID=
# ROOT_SECRET_ACCESS_KEY=
# Host port mapped to the gateway (container listens on 10000).
VERSITYGW_PORT=10000
# WebUI (container listens on 8080). In Pangolin, create a *second* HTTP resource for this
# port — do not point the UI hostname at :10000 (that is S3 API only; `/` is not the SPA).
VERSITYGW_WEBUI_PORT=8080
# HTTPS URL of the *S3 API* (Pangolin resource → host :10000). **Not** the WebUI URL.
# No trailing slash. Wrong value → WebUI calls the wrong host and bucket create can 404.
# VGW_WEBUI_GATEWAYS=https://s3.example.com
VGW_WEBUI_GATEWAYS=
# Public origin of the **WebUI** page (Pangolin → :8080), e.g. https://s3-ui.example.com
# Required when UI and API are on different hosts so the browser can call the API (CORS).
# VGW_CORS_ALLOW_ORIGIN=https://s3-ui.example.com
VGW_CORS_ALLOW_ORIGIN=
# NFS: object metadata defaults to xattrs; most NFS mounts need sidecar mode
# (compose.yaml uses --sidecar /data/sidecar). Create the host path, e.g.
# mkdir -p /mnt/nfs/versity/sidecar
# Or use NFSv4.2 with xattr support and remove --sidecar from compose if you prefer.

View File

@@ -0,0 +1,64 @@
# Versity S3 Gateway — POSIX backend over Docker volumes.
# https://github.com/versity/versitygw
#
# POSIX default metadata uses xattrs; NFS often lacks xattr support unless NFSv4.2
# + client/server support. `--sidecar` stores metadata in files instead (see
# `posix` flags / VGW_META_SIDECAR in cmd/versitygw/posix.go).
services:
versitygw:
image: versity/versitygw:v1.3.1
container_name: versitygw
restart: unless-stopped
# Credentials: use `${VAR}` so values come from the same env Komodo passes with
# `docker compose --env-file <run_directory>/.env` (see Komodo Stack docs).
# Do NOT use `env_file: .env` here: that path is resolved next to *this* compose
# file, while Komodo writes `.env` under `run_directory` — they often differ
# (e.g. run_directory = repo root, compose in komodo/s3/versitygw/).
environment:
ROOT_ACCESS_KEY: ${ROOT_ACCESS_KEY}
ROOT_SECRET_KEY: ${ROOT_SECRET_KEY}
ROOT_ACCESS_KEY_ID: ${ROOT_ACCESS_KEY_ID}
ROOT_SECRET_ACCESS_KEY: ${ROOT_SECRET_ACCESS_KEY}
# Matches Helm chart default; enables `/_/health` for probes.
VGW_HEALTH: /_/health
# WebUI (browser): separate listener; TLS terminates at Pangolin — serve HTTP in-container.
VGW_WEBUI_NO_TLS: "true"
# Public base URL of the *S3 API* only (Pangolin → :10000). Not the WebUI hostname.
# No trailing slash. If this points at the UI URL, bucket ops return 404/wrong host.
VGW_WEBUI_GATEWAYS: ${VGW_WEBUI_GATEWAYS}
# Browser Origin when WebUI and API use different HTTPS hostnames (see wiki / WebGUI CORS).
VGW_CORS_ALLOW_ORIGIN: ${VGW_CORS_ALLOW_ORIGIN}
ports:
- "${VERSITYGW_PORT:-10000}:10000"
- "${VERSITYGW_WEBUI_PORT:-8080}:8080"
volumes:
- /mnt/nfs/versity/s3:/data/s3
- /mnt/nfs/versity/iam:/data/iam
- /mnt/nfs/versity/versions:/data/versions
- /mnt/nfs/versity/sidecar:/data/sidecar
command:
- "--port"
- ":10000"
# Optional WebUI — without this, only the S3 API is served (browsers often see 404 on `/`).
- "--webui"
- ":8080"
- "--iam-dir"
- "/data/iam"
- "posix"
- "--sidecar"
- "/data/sidecar"
- "--versioning-dir"
- "/data/versions"
- "/data/s3"
healthcheck:
test:
[
"CMD",
"wget",
"-qO-",
"http://127.0.0.1:10000/_/health",
]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s

View File

@@ -4,24 +4,24 @@ This document is the **exported TODO** for the **noble** Talos cluster (4 nodes)
## Current state (2026-03-28)
Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vault **CiliumNetworkPolicy**, **`talos/runbooks/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** when S3 exists.
Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (**`talos/runbooks/`**, **SOPS**-encrypted secrets in **`clusters/noble/secrets/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** backup/restore drill after S3 credentials are set (**`noble_velero_install`**).
- **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
- **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/bootstrap/cilium/`, phase 1 values).
- **CSI Volume Snapshot** — **external-snapshotter** **v8.5.0** CRDs + **`registry.k8s.io/sig-storage/snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**).
- **MetalLB** Helm **0.15.3** / app **v0.15.3**; **IPAddressPool** `noble-l2` + **L2Advertisement** — pool **`192.168.50.210``192.168.50.229`**.
- **kube-vip** DaemonSet **3/3** on control planes; VIP **`192.168.50.230`** on **`ens18`** (`vip_subnet` **`/32`** required — bare **`32`** breaks parsing). **Verified from workstation:** `kubectl config set-cluster noble --server=https://192.168.50.230:6443` then **`kubectl get --raw /healthz`** → **`ok`** (`talos/kubeconfig`; see `talos/README.md`).
- **metrics-server** Helm **3.13.0** / app **v0.8.0**`clusters/noble/bootstrap/metrics-server/values.yaml` (`--kubelet-insecure-tls` for Talos); **`kubectl top nodes`** works.
- **Longhorn** Helm **1.11.1** / app **v1.11.1**`clusters/noble/bootstrap/longhorn/` (PSA **privileged** namespace, `defaultDataPath` `/var/mnt/longhorn`, `preUpgradeChecker` enabled); **StorageClass** `longhorn` (default); **`nodes.longhorn.io`** all **Ready**; test **PVC** `Bound` on `longhorn`.
- **Traefik** Helm **39.0.6** / app **v3.6.11**`clusters/noble/bootstrap/traefik/`; **`Service`** **`LoadBalancer`** **`EXTERNAL-IP` `192.168.50.211`**; **`IngressClass`** **`traefik`** (default). Point **`*.apps.noble.lab.pcenicni.dev`** at **`192.168.50.211`**. MetalLB pool verification was done before replacing the temporary nginx test with Traefik.
- **cert-manager** Helm **v1.20.0** / app **v1.20.0**`clusters/noble/bootstrap/cert-manager/`; **`ClusterIssuer`** **`letsencrypt-staging`** and **`letsencrypt-prod`** (**DNS-01** via **Cloudflare** for **`pcenicni.dev`**, Secret **`cloudflare-dns-api-token`** in **`cert-manager`**); ACME email **`certificates@noble.lab.pcenicni.dev`** (edit in manifests if you want a different mailbox).
- **Newt** Helm **1.2.0** / app **1.10.1**`clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Prefer a **SealedSecret** in git (`kubeseal` — see `clusters/noble/bootstrap/sealed-secrets/examples/`) after rotating credentials if they were exposed. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolins domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
- **Argo CD** Helm **9.4.17** / app **v3.3.6**`clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; app-of-apps root syncs **`clusters/noble/apps/`** (edit **`root-application.yaml`** `repoURL` before applying).
- **Newt** Helm **1.2.0** / app **1.10.1**`clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Store credentials in git with **SOPS** (`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`, **`age-key.txt`**, **`.sops.yaml`**) — see **`clusters/noble/secrets/README.md`**. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolins domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
- **Argo CD** Helm **9.4.17** / app **v3.3.6**`clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; **`noble-root`** → **`clusters/noble/apps/`**; **`noble-bootstrap-root`** → **`clusters/noble/bootstrap`** (manual sync until **`argocd/README.md`** §5 after **`noble.yml`**). Edit **`repoURL`** in both root **`Application`** files before applying.
- **kube-prometheus-stack** — Helm chart **82.15.1**`clusters/noble/bootstrap/kube-prometheus-stack/` (**namespace** `monitoring`, PSA **privileged****node-exporter** needs host mounts); **Longhorn** PVCs for Prometheus, Grafana, Alertmanager; **node-exporter** DaemonSet **4/4**. **Grafana Ingress:** **`https://grafana.apps.noble.lab.pcenicni.dev`** (Traefik **`ingressClassName: traefik`**, **`cert-manager.io/cluster-issuer: letsencrypt-prod`**). **Loki** datasource in Grafana: ConfigMap **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** (sidecar label **`grafana_datasource: "1"`**) — not via **`grafana.additionalDataSources`** in the chart. **`helm upgrade --install` with `--wait` is silent until done** — use **`--timeout 30m`**; Grafana admin: Secret **`kube-prometheus-grafana`**, keys **`admin-user`** / **`admin-password`**.
- **Loki** + **Fluent Bit****`grafana/loki` 6.55.0** SingleBinary + **filesystem** on **Longhorn** (`clusters/noble/bootstrap/loki/`); **`loki.auth_enabled: false`**; **`chunksCache.enabled: false`** (no memcached chunk cache). **`fluent/fluent-bit` 0.56.0** → **`loki-gateway.loki.svc:80`** (`clusters/noble/bootstrap/fluent-bit/`); **`logging`** PSA **privileged**. **Grafana Explore:** **`kubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** then **Explore → Loki** (e.g. `{job="fluent-bit"}`).
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1**`clusters/noble/bootstrap/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README).
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0**`clusters/noble/bootstrap/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
- **Vault** Helm **0.32.0** / app **1.21.2**`clusters/noble/bootstrap/vault/` — standalone **file** storage, **Longhorn** PVC; **HTTP** listener (`global.tlsDisable`); optional **CronJob** lab unseal **`unseal-cronjob.yaml`**; **not** initialized in git — run **`vault operator init`** per **`README.md`**.
- **Still open:** **Renovate** — install **[Mend Renovate](https://github.com/apps/renovate)** (or self-host) so PRs run; optional **Alertmanager** notification channels; optional **sample Ingress + cert + Pangolin** end-to-end; **Velero** when S3 is ready; **Argo CD SSO**.
- **SOPS** — cluster **`Secret`** manifests under **`clusters/noble/secrets/`** encrypted with **age** (see **`.sops.yaml`**, **`age-key.txt`** gitignored); **`noble.yml`** decrypt-applies when the private key is present.
- **Velero** Helm **12.0.0** / app **v1.18.0**`clusters/noble/bootstrap/velero/` (**Ansible** **`noble_velero`**, not Argo); **S3-compatible** backup location + **CSI** snapshots (**`EnableCSI`**); enable with **`noble_velero_install`** per **`velero/README.md`**.
- **Still open:** **Renovate**install **[Mend Renovate](https://github.com/apps/renovate)** (or self-host) so PRs run; optional **Alertmanager** notification channels; optional **sample Ingress + cert + Pangolin** end-to-end; **Argo CD SSO**.
## Inventory
@@ -44,7 +44,7 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
| Grafana (Ingress + TLS) | **`grafana.apps.noble.lab.pcenicni.dev`** — `grafana.ingress` in `clusters/noble/bootstrap/kube-prometheus-stack/values.yaml` (**`letsencrypt-prod`**) |
| Headlamp (Ingress + TLS) | **`headlamp.apps.noble.lab.pcenicni.dev`** — chart `ingress` in `clusters/noble/bootstrap/headlamp/` (**`letsencrypt-prod`**, **`ingressClassName: traefik`**) |
| Public DNS (Pangolin) | **Newt** tunnel + **CNAME** at registrar + **Integration API**`clusters/noble/bootstrap/newt/` |
| Velero | S3-compatible URL — configure later |
| Velero | S3-compatible endpoint + bucket — **`clusters/noble/bootstrap/velero/`**, **`ansible/playbooks/noble.yml`** (**`noble_velero_install`**) |
## Versions
@@ -62,11 +62,9 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
- kube-prometheus-stack: **82.15.1** (Helm chart `prometheus-community/kube-prometheus-stack`; app **v0.89.x** bundle)
- Loki: **6.55.0** (Helm chart `grafana/loki`; app **3.6.7**)
- Fluent Bit: **0.56.0** (Helm chart `fluent/fluent-bit`; app **4.2.3**)
- Sealed Secrets: **2.18.4** (Helm chart `sealed-secrets/sealed-secrets`; app **0.36.1**)
- External Secrets Operator: **2.2.0** (Helm chart `external-secrets/external-secrets`; app **v2.2.0**)
- Vault: **0.32.0** (Helm chart `hashicorp/vault`; app **1.21.2**)
- Kyverno: **3.7.1** (Helm chart `kyverno/kyverno`; app **v1.17.1**); **kyverno-policies** **3.7.1****baseline** PSS, **Audit** (`clusters/noble/bootstrap/kyverno/`)
- Headlamp: **0.40.1** (Helm chart `headlamp/headlamp`; app matches chart — see [Artifact Hub](https://artifacthub.io/packages/helm/headlamp/headlamp))
- Velero: **12.0.0** (Helm chart `vmware-tanzu/velero`; app **v1.18.0**) — **`clusters/noble/bootstrap/velero/`**; AWS plugin **v1.14.0**; Ansible **`noble_velero`**
- Renovate: **hosted** (Mend **Renovate** GitHub/GitLab app — no cluster chart) **or** **self-hosted** — pin chart when added ([Helm charts](https://docs.renovatebot.com/helm-charts/), OCI `ghcr.io/renovatebot/charts/renovate`); pair **`renovate.json`** with this repos Helm paths under **`clusters/noble/`**
## Repo paths (this workspace)
@@ -74,30 +72,30 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
| Artifact | Path |
|----------|------|
| This checklist | `talos/CLUSTER-BUILD.md` |
| Operational runbooks (API VIP, etcd, Longhorn, Vault) | `talos/runbooks/` |
| Operational runbooks (API VIP, etcd, Longhorn, SOPS) | `talos/runbooks/` |
| Talos quick start + networking + kubeconfig | `talos/README.md` |
| talhelper source (active) | `talos/talconfig.yaml` — may be **wipe-phase** (no Longhorn volume) during disk recovery |
| Longhorn volume restore | `talos/talconfig.with-longhorn.yaml` — copy to `talconfig.yaml` after GPT wipe (see `talos/README.md` §5) |
| Longhorn GPT wipe automation | `talos/scripts/longhorn-gpt-recovery.sh` |
| kube-vip (kustomize) | `clusters/noble/bootstrap/kube-vip/` (`vip_interface` e.g. `ens18`) |
| Cilium (Helm values) | `clusters/noble/bootstrap/cilium/``values.yaml` (phase 1), optional `values-kpr.yaml`, `README.md` |
| CSI Volume Snapshot (CRDs + controller) | `clusters/noble/bootstrap/csi-snapshot-controller/``crd/`, `controller/` kustomize; **`ansible/roles/noble_csi_snapshot_controller`** |
| MetalLB | `clusters/noble/bootstrap/metallb/``namespace.yaml` (PSA **privileged**), `ip-address-pool.yaml`, `kustomization.yaml`, `README.md` |
| Longhorn | `clusters/noble/bootstrap/longhorn/``values.yaml`, `namespace.yaml` (PSA **privileged**), `kustomization.yaml` |
| metrics-server (Helm values) | `clusters/noble/bootstrap/metrics-server/values.yaml` |
| Traefik (Helm values) | `clusters/noble/bootstrap/traefik/``values.yaml`, `namespace.yaml`, `README.md` |
| cert-manager (Helm + ClusterIssuers) | `clusters/noble/bootstrap/cert-manager/``values.yaml`, `namespace.yaml`, `kustomization.yaml`, `README.md` |
| Newt / Pangolin tunnel (Helm) | `clusters/noble/bootstrap/newt/``values.yaml`, `namespace.yaml`, `README.md` |
| Argo CD (Helm) + optional app-of-apps | `clusters/noble/bootstrap/argocd/``values.yaml`, `root-application.yaml`, `README.md`; optional **`Application`** tree in **`clusters/noble/apps/`** |
| Argo CD (Helm) + app-of-apps | `clusters/noble/bootstrap/argocd/``values.yaml`, `root-application.yaml`, `bootstrap-root-application.yaml`, `app-of-apps/`, `README.md`; **`noble-root`** syncs **`clusters/noble/apps/`**; **`noble-bootstrap-root`** syncs **`clusters/noble/bootstrap`** (enable automation after **`noble.yml`**) |
| kube-prometheus-stack (Helm values) | `clusters/noble/bootstrap/kube-prometheus-stack/``values.yaml`, `namespace.yaml` |
| Grafana Loki datasource (ConfigMap; no chart change) | `clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml` |
| Loki (Helm values) | `clusters/noble/bootstrap/loki/``values.yaml`, `namespace.yaml` |
| Fluent Bit → Loki (Helm values) | `clusters/noble/bootstrap/fluent-bit/``values.yaml`, `namespace.yaml` |
| Sealed Secrets (Helm) | `clusters/noble/bootstrap/sealed-secrets/``values.yaml`, `namespace.yaml`, `README.md` |
| External Secrets Operator (Helm + Vault store example) | `clusters/noble/bootstrap/external-secrets/``values.yaml`, `namespace.yaml`, `README.md`, `examples/vault-cluster-secret-store.yaml` |
| Vault (Helm + optional unseal CronJob) | `clusters/noble/bootstrap/vault/``values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `cilium-network-policy.yaml`, `configure-kubernetes-auth.sh`, `README.md` |
| SOPS-encrypted cluster Secrets | `clusters/noble/secrets/``README.md`, `*.secret.yaml`; **`.sops.yaml`**, **`age-key.txt`** (gitignored) at repo root |
| Kyverno + PSS baseline policies | `clusters/noble/bootstrap/kyverno/``values.yaml`, `policies-values.yaml`, `namespace.yaml`, `README.md` |
| Headlamp (Helm + Ingress) | `clusters/noble/bootstrap/headlamp/``values.yaml`, `namespace.yaml`, `README.md` |
| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (**Sealed Secrets** / **ESO** after **Phase E**) |
| Velero (Helm + S3 BSL; CSI snapshots) | `clusters/noble/bootstrap/velero/``values.yaml`, `namespace.yaml`, `README.md`; **`ansible/roles/noble_velero`** |
| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (SOPS under **`clusters/noble/secrets/`** or imperative **`kubectl create secret`**) |
**Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.
@@ -106,11 +104,12 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
1. **Talos** installed; **Cilium** (or chosen CNI) **before** most workloads — with `cni: none`, nodes stay **NotReady** / **network-unavailable** taint until CNI is up.
2. **MetalLB Helm chart** (CRDs + controller) **before** `kubectl apply -k` on the pool manifests.
3. **`clusters/noble/bootstrap/metallb/namespace.yaml`** before or merged onto `metallb-system` so Pod Security does not block speaker (see `bootstrap/metallb/README.md`).
4. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
5. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
6. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured.
4. **CSI Volume snapshots:** **`kubernetes-csi/external-snapshotter`** CRDs + **`snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`) before relying on **Longhorn** / **Velero** volume snapshots.
5. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
6. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
7. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself.
8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, store the token with **SOPS** or an imperative Secret — no ingress required for the bot itself.
9. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
## Prerequisites (before phases)
@@ -136,9 +135,10 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
## Phase B — Core platform
**Install order:** **Cilium****metrics-server****Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned.
**Install order:** **Cilium****Volume Snapshot CRDs + snapshot-controller** (`clusters/noble/bootstrap/csi-snapshot-controller/`, Ansible **`noble_csi_snapshot_controller`**) → **metrics-server****Longhorn** (Talos disk + Helm) → **MetalLB** (Helm → pool manifests) → ingress / certs / DNS as planned.
- [x] **Cilium** (Helm **1.16.6**) — **required** before MetalLB if `cni: none` (`clusters/noble/bootstrap/cilium/`)
- [x] **CSI Volume Snapshot** — CRDs + **`snapshot-controller`** in **`kube-system`** (`clusters/noble/bootstrap/csi-snapshot-controller/`); Ansible **`noble_csi_snapshot_controller`**; verify `kubectl api-resources | grep VolumeSnapshot`
- [x] **metrics-server** — Helm **3.13.0**; values in `clusters/noble/bootstrap/metrics-server/values.yaml`; verify `kubectl top nodes`
- [x] **Longhorn** — Talos: user volume + kubelet mounts + extensions (`talos/README.md` §5); Helm **1.11.1**; `kubectl apply -k clusters/noble/bootstrap/longhorn`; verify **`nodes.longhorn.io`** and test PVC **`Bound`**
- [x] **MetalLB** — chart installed; **pool + L2** from `clusters/noble/bootstrap/metallb/` applied (`192.168.50.210``229`)
@@ -152,7 +152,7 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
- [x] **Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`) — also covered by **`ansible/playbooks/noble.yml`** (role **`noble_argocd`**)
- [x] Argo CD server **LoadBalancer****`192.168.50.210`** (see `values.yaml`)
- [x] **App-of-apps** — optional; **`clusters/noble/apps/kustomization.yaml`** is **empty** (core stack is **Ansible**-managed from **`clusters/noble/bootstrap/`**, not Argo). Set **`repoURL`** in **`root-application.yaml`** and add **`Application`** manifests only for optional GitOps workloads — see **`clusters/noble/apps/README.md`**
- [x] **Renovate****`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **Sealed Secrets** / **ESO**. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed.
- [x] **Renovate****`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **SOPS** or a one-off Secret. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed.
- [ ] SSO — later
## Phase D — Observability
@@ -163,19 +163,16 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
## Phase E — Secrets
- [x] **Sealed Secrets** (optional Git workflow) — `clusters/noble/bootstrap/sealed-secrets/` (Helm **2.18.4**); **`kubeseal`** + key backup per **`README.md`**
- [x] **Vault** in-cluster on Longhorn + **auto-unseal**`clusters/noble/bootstrap/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); **`configure-kubernetes-auth.sh`** for ESO (**Kubernetes auth** + KV + role)
- [x] **External Secrets Operator** + Vault `ClusterSecretStore` — operator **`clusters/noble/bootstrap/external-secrets/`** (Helm **2.2.0**); apply **`examples/vault-cluster-secret-store.yaml`** after Vault (**`README.md`**)
- [x] **SOPS** — encrypt **`Secret`** YAML under **`clusters/noble/secrets/`** with **age** (see **`.sops.yaml`**, **`clusters/noble/secrets/README.md`**); keep **`age-key.txt`** private (gitignored). **`ansible/playbooks/noble.yml`** decrypt-applies **`*.yaml`** when **`age-key.txt`** exists.
## Phase F — Policy + backups
- [x] **Kyverno** baseline policies — `clusters/noble/bootstrap/kyverno/` (Helm **kyverno** **3.7.1** + **kyverno-policies** **3.7.1**, **baseline** / **Audit** — see **`README.md`**)
- [ ] **Velero** when S3 is ready; backup/restore drill
- [ ] **Velero** — manifests + Ansible **`noble_velero`** (`clusters/noble/bootstrap/velero/`); enable with **`noble_velero_install: true`** + S3 bucket/URL + **`velero/velero-cloud-credentials`** (see **`velero/README.md`**); optional backup/restore drill
## Phase G — Hardening
- [x] **Cilium** Vault **`CiliumNetworkPolicy`** (`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`) — HTTP **8200** from **`external-secrets`** + **`vault`**; extend for other clients as needed
- [x] **Runbooks****`talos/runbooks/`** (API VIP / kube-vip, etcdTalos, Longhorn, Vault)
- [x] **Runbooks****`talos/runbooks/`** (API VIP / kube-vip, etcdTalos, Longhorn, SOPS)
- [x] **RBAC****Headlamp** **`ClusterRoleBinding`** uses built-in **`edit`** (not **`cluster-admin`**); **Argo CD** **`policy.default: role:readonly`** with **`g, admin, role:admin`** — see **`clusters/noble/bootstrap/headlamp/values.yaml`**, **`clusters/noble/bootstrap/argocd/values.yaml`**, **`talos/runbooks/rbac.md`**
- [ ] **Alertmanager** — add **`slack_configs`**, **`pagerduty_configs`**, or other receivers under **`kube-prometheus-stack`** `alertmanager.config` (chart defaults use **`null`** receiver)
@@ -193,11 +190,10 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
- [x] **`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**)
- [x] **Grafana****Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync)
- [x] **Headlamp** — Deployment **Running** in **`headlamp`**; UI at **`https://headlamp.apps.noble.lab.pcenicni.dev`** (TLS via **`letsencrypt-prod`**)
- [x] **`sealed-secrets`** — controller **Deployment** **Running** in **`sealed-secrets`** (install + **`kubeseal`** per **`apps/sealed-secrets/README.md`**)
- [x] **`external-secrets`** — controller + webhook + cert-controller **Running** in **`external-secrets`**; apply **`ClusterSecretStore`** after Vault **Kubernetes auth**
- [x] **`vault`** — **StatefulSet** **Running**, **`data-vault-0`** PVC **Bound** on **longhorn**; **`vault operator init`** + unseal per **`apps/vault/README.md`**
- [x] **SOPS secrets****`clusters/noble/secrets/*.yaml`** encrypted in git; **`noble.yml`** applies decrypted manifests when **`age-key.txt`** is present
- [x] **`kyverno`** — admission / background / cleanup / reports controllers **Running** in **`kyverno`**; **ClusterPolicies** for **PSS baseline** **Ready** (**Audit**)
- [x] **Phase G (partial)** — Vault **`CiliumNetworkPolicy`**; **`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
- [ ] **`velero`** — when enabled: Deployment **Running** in **`velero`**; **`BackupStorageLocation`** / **`VolumeSnapshotLocation`** **Available**; test backup per **`velero/README.md`**
- [x] **Phase G (partial)****`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
---

View File

@@ -1,7 +1,7 @@
# Talos — noble lab
- **Cluster build checklist (exported TODO):** [CLUSTER-BUILD.md](./CLUSTER-BUILD.md)
- **Operational runbooks (API VIP, etcd, Longhorn, Vault):** [runbooks/README.md](./runbooks/README.md)
- **Operational runbooks (API VIP, etcd, Longhorn, SOPS):** [runbooks/README.md](./runbooks/README.md)
## Versions

View File

@@ -7,5 +7,5 @@ Short recovery / triage notes for the **noble** Talos cluster. Deep procedures l
| Kubernetes API VIP (kube-vip) | [`api-vip-kube-vip.md`](./api-vip-kube-vip.md) |
| etcd / Talos control plane | [`etcd-talos.md`](./etcd-talos.md) |
| Longhorn storage | [`longhorn.md`](./longhorn.md) |
| Vault (unseal, auth, ESO) | [`vault.md`](./vault.md) |
| SOPS (secrets in git) | [`sops.md`](./sops.md) |
| RBAC (Headlamp, Argo CD) | [`rbac.md`](./rbac.md) |

13
talos/runbooks/sops.md Normal file
View File

@@ -0,0 +1,13 @@
# Runbook: SOPS secrets (git-encrypted)
**Symptoms:** `sops -d` fails; `kubectl apply` after Ansible shows no secret; `noble.yml` skips apply.
**Checklist**
1. **Private key:** `age-key.txt` at the repository root (gitignored). Create with `age-keygen -o age-key.txt` and add the **public** key to `.sops.yaml` (see `clusters/noble/secrets/README.md`).
2. **Environment:** `export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt` when editing or applying by hand.
3. **Edit encrypted file:** `sops clusters/noble/secrets/<name>.secret.yaml`
4. **Apply one file:** `sops -d clusters/noble/secrets/<name>.secret.yaml | kubectl apply -f -`
5. **Ansible:** `noble_apply_sops_secrets` is true by default; the platform role applies all `*.yaml` when `age-key.txt` exists.
**References:** [`clusters/noble/secrets/README.md`](../../clusters/noble/secrets/README.md), [Mozilla SOPS](https://github.com/getsops/sops).

View File

@@ -1,15 +0,0 @@
# Runbook: Vault (in-cluster)
**Symptoms:** External Secrets **not syncing**, `ClusterSecretStore` **InvalidProviderConfig**, Vault UI/API **503 sealed**, pods **CrashLoop** on auth.
**Checks**
1. `kubectl -n vault exec -i sts/vault -- vault status`**Sealed** / **Initialized**.
2. Unseal key Secret + optional CronJob: [`clusters/noble/bootstrap/vault/README.md`](../../clusters/noble/bootstrap/vault/README.md), `unseal-cronjob.yaml`.
3. Kubernetes auth for ESO: [`clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`](../../clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh) and `kubectl describe clustersecretstore vault`.
4. **Cilium** policy: if Vault is unreachable from `external-secrets`, check [`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`](../../clusters/noble/bootstrap/vault/cilium-network-policy.yaml) and extend `ingress` for new client namespaces.
**Common fixes**
- Sealed: `vault operator unseal` or fix auto-unseal CronJob + `vault-unseal-key` Secret.
- **403/invalid role** on ESO: re-run Kubernetes auth setup (issuer/CA/reviewer JWT) per README.