Update Ansible configuration to integrate SOPS for managing secrets. Enhance README.md with SOPS usage instructions and prerequisites. Remove External Secrets Operator references and related configurations from the bootstrap process, streamlining the deployment. Adjust playbooks and roles to apply SOPS-encrypted secrets automatically, improving security and clarity in secret management.

This commit is contained in:
Nikholas Pcenicni
2026-03-30 22:42:52 -04:00
parent 023ebfee5d
commit 3a6e5dff5b
44 changed files with 644 additions and 809 deletions

7
.sops.yaml Normal file
View File

@@ -0,0 +1,7 @@
# Mozilla SOPS — encrypt/decrypt Kubernetes Secret manifests under clusters/noble/secrets/
# Generate a key: age-keygen -o age-key.txt (age-key.txt is gitignored)
# Add the printed public key below (one recipient per line is supported).
creation_rules:
- path_regex: clusters/noble/secrets/.*\.yaml$
age: >-
age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn

View File

@@ -24,6 +24,7 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
## Prerequisites ## Prerequisites
- `talosctl` (matches node Talos version), `talhelper`, `helm`, `kubectl`. - `talosctl` (matches node Talos version), `talhelper`, `helm`, `kubectl`.
- **SOPS secrets:** `sops` and `age` on the control host if you use **`clusters/noble/secrets/`** with **`age-key.txt`** (see **`clusters/noble/secrets/README.md`**).
- **Phase A:** same LAN/VPN as nodes so **Talos :50000** and **Kubernetes :6443** are reachable (see [`talos/README.md`](../talos/README.md) §3). - **Phase A:** same LAN/VPN as nodes so **Talos :50000** and **Kubernetes :6443** are reachable (see [`talos/README.md`](../talos/README.md) §3).
- **noble.yml:** bootstrapped cluster and **`talos/kubeconfig`** (or `KUBECONFIG`). - **noble.yml:** bootstrapped cluster and **`talos/kubeconfig`** (or `KUBECONFIG`).
@@ -34,7 +35,7 @@ Copy **`.env.sample`** to **`.env`** at the repository root (`.env` is gitignore
| [`playbooks/deploy.yml`](playbooks/deploy.yml) | **Talos Phase A** then **`noble.yml`** (full automation). | | [`playbooks/deploy.yml`](playbooks/deploy.yml) | **Talos Phase A** then **`noble.yml`** (full automation). |
| [`playbooks/talos_phase_a.yml`](playbooks/talos_phase_a.yml) | `genconfig``apply-config``bootstrap``kubeconfig` only. | | [`playbooks/talos_phase_a.yml`](playbooks/talos_phase_a.yml) | `genconfig``apply-config``bootstrap``kubeconfig` only. |
| [`playbooks/noble.yml`](playbooks/noble.yml) | Helm + `kubectl` platform (after Phase A). | | [`playbooks/noble.yml`](playbooks/noble.yml) | Helm + `kubectl` platform (after Phase A). |
| [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | Vault / ESO reminders (`noble_apply_vault_cluster_secret_store`). | | [`playbooks/post_deploy.yml`](playbooks/post_deploy.yml) | SOPS reminders and optional Argo root Application note. |
| [`playbooks/talos_bootstrap.yml`](playbooks/talos_bootstrap.yml) | **`talhelper genconfig` only** (legacy shortcut; prefer **`talos_phase_a.yml`**). | | [`playbooks/talos_bootstrap.yml`](playbooks/talos_bootstrap.yml) | **`talhelper genconfig` only** (legacy shortcut; prefer **`talos_phase_a.yml`**). |
```bash ```bash
@@ -68,9 +69,10 @@ ansible-playbook playbooks/noble.yml --skip-tags newt
ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true -e noble_velero_s3_bucket=... -e noble_velero_s3_url=... ansible-playbook playbooks/noble.yml --tags velero -e noble_velero_install=true -e noble_velero_s3_bucket=... -e noble_velero_s3_url=...
``` ```
### Variables — `group_vars/all.yml` ### Variables — `group_vars/all.yml` and role defaults
- **`noble_newt_install`**, **`noble_velero_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_apply_vault_cluster_secret_store`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**. - **`group_vars/all.yml`:** **`noble_newt_install`**, **`noble_velero_install`**, **`noble_cert_manager_require_cloudflare_secret`**, **`noble_k8s_api_server_override`**, **`noble_k8s_api_server_auto_fallback`**, **`noble_k8s_api_server_fallback`**, **`noble_skip_k8s_health_check`**
- **`roles/noble_platform/defaults/main.yml`:** **`noble_apply_sops_secrets`**, **`noble_sops_age_key_file`** (SOPS secrets under **`clusters/noble/secrets/`**)
## Roles ## Roles

View File

@@ -13,14 +13,11 @@ noble_k8s_api_server_fallback: "https://192.168.50.20:6443"
# Only if you must skip the kubectl /healthz preflight (not recommended). # Only if you must skip the kubectl /healthz preflight (not recommended).
noble_skip_k8s_health_check: false noble_skip_k8s_health_check: false
# Pangolin / Newt — set true only after creating newt-pangolin-auth Secret (see clusters/noble/bootstrap/newt/README.md) # Pangolin / Newt — set true only after newt-pangolin-auth Secret exists (SOPS: clusters/noble/secrets/ or imperative — see clusters/noble/bootstrap/newt/README.md)
noble_newt_install: false noble_newt_install: false
# cert-manager needs Secret cloudflare-dns-api-token in cert-manager namespace before ClusterIssuers work # cert-manager needs Secret cloudflare-dns-api-token in cert-manager namespace before ClusterIssuers work
noble_cert_manager_require_cloudflare_secret: true noble_cert_manager_require_cloudflare_secret: true
# post_deploy.yml — apply Vault ClusterSecretStore only after Vault is initialized and K8s auth is configured
noble_apply_vault_cluster_secret_store: false
# Velero — set **noble_velero_install: true** plus S3 bucket/URL (and credentials — see clusters/noble/bootstrap/velero/README.md) # Velero — set **noble_velero_install: true** plus S3 bucket/URL (and credentials — see clusters/noble/bootstrap/velero/README.md)
noble_velero_install: false noble_velero_install: false

View File

@@ -1,12 +1,7 @@
--- ---
# Manual follow-ups after **noble.yml**: Vault init/unseal, Kubernetes auth for Vault, ESO ClusterSecretStore. # Manual follow-ups after **noble.yml**: SOPS key backup, optional Argo root Application.
# Run: ansible-playbook playbooks/post_deploy.yml - hosts: localhost
- name: Noble cluster — post-install reminders
hosts: localhost
connection: local connection: local
gather_facts: false gather_facts: false
vars:
noble_repo_root: "{{ playbook_dir | dirname | dirname }}"
noble_kubeconfig: "{{ lookup('env', 'KUBECONFIG') | default(noble_repo_root + '/talos/kubeconfig', true) }}"
roles: roles:
- role: noble_post_deploy - noble_post_deploy

View File

@@ -8,9 +8,6 @@ noble_helm_repos:
- { name: fossorial, url: "https://charts.fossorial.io" } - { name: fossorial, url: "https://charts.fossorial.io" }
- { name: argo, url: "https://argoproj.github.io/argo-helm" } - { name: argo, url: "https://argoproj.github.io/argo-helm" }
- { name: metrics-server, url: "https://kubernetes-sigs.github.io/metrics-server/" } - { name: metrics-server, url: "https://kubernetes-sigs.github.io/metrics-server/" }
- { name: sealed-secrets, url: "https://bitnami-labs.github.io/sealed-secrets" }
- { name: external-secrets, url: "https://charts.external-secrets.io" }
- { name: hashicorp, url: "https://helm.releases.hashicorp.com" }
- { name: prometheus-community, url: "https://prometheus-community.github.io/helm-charts" } - { name: prometheus-community, url: "https://prometheus-community.github.io/helm-charts" }
- { name: grafana, url: "https://grafana.github.io/helm-charts" } - { name: grafana, url: "https://grafana.github.io/helm-charts" }
- { name: fluent, url: "https://fluent.github.io/helm-charts" } - { name: fluent, url: "https://fluent.github.io/helm-charts" }

View File

@@ -39,11 +39,6 @@ noble_lab_ui_entries:
namespace: longhorn-system namespace: longhorn-system
service: longhorn-frontend service: longhorn-frontend
url: https://longhorn.apps.noble.lab.pcenicni.dev url: https://longhorn.apps.noble.lab.pcenicni.dev
- name: Vault
description: Secrets engine UI (after init/unseal)
namespace: vault
service: vault
url: https://vault.apps.noble.lab.pcenicni.dev
- name: Velero - name: Velero
description: Cluster backups — no web UI (velero CLI / kubectl CRDs) description: Cluster backups — no web UI (velero CLI / kubectl CRDs)
namespace: velero namespace: velero

View File

@@ -24,7 +24,6 @@ This file is **generated** by Ansible (`noble_landing_urls` role). Use it as a t
| **Prometheus** | — | No auth in default install (lab). | | **Prometheus** | — | No auth in default install (lab). |
| **Alertmanager** | — | No auth in default install (lab). | | **Alertmanager** | — | No auth in default install (lab). |
| **Longhorn** | — | No default login unless you enable access control in the UI settings. | | **Longhorn** | — | No default login unless you enable access control in the UI settings. |
| **Vault** | Token | Root token is only from **`vault operator init`** (not stored in git). See `clusters/noble/bootstrap/vault/README.md`. |
### Commands to retrieve passwords (if not filled above) ### Commands to retrieve passwords (if not filled above)
@@ -46,7 +45,7 @@ To generate this file **without** calling kubectl, run Ansible with **`-e noble_
- **Argo CD** `argocd-initial-admin-secret` disappears after you change the admin password. - **Argo CD** `argocd-initial-admin-secret` disappears after you change the admin password.
- **Grafana** password is random unless you set `grafana.adminPassword` in chart values. - **Grafana** password is random unless you set `grafana.adminPassword` in chart values.
- **Vault** UI needs **unsealed** Vault; tokens come from your chosen auth method.
- **Prometheus / Alertmanager** UIs are unauthenticated by default — restrict when hardening (`talos/CLUSTER-BUILD.md` Phase G). - **Prometheus / Alertmanager** UIs are unauthenticated by default — restrict when hardening (`talos/CLUSTER-BUILD.md` Phase G).
- **SOPS:** cluster secrets in git under **`clusters/noble/secrets/`** are encrypted; decrypt with **`age-key.txt`** (not in git). See **`clusters/noble/secrets/README.md`**.
- **Headlamp** token above expires after the configured duration; re-run Ansible or `kubectl create token` to refresh. - **Headlamp** token above expires after the configured duration; re-run Ansible or `kubectl create token` to refresh.
- **Velero** has **no web UI** — use **`velero`** CLI or **`kubectl -n velero get backup,schedule,backupstoragelocation`**. Metrics: **`velero`** Service in **`velero`** (Prometheus scrape). See `clusters/noble/bootstrap/velero/README.md`. - **Velero** has **no web UI** — use **`velero`** CLI or **`kubectl -n velero get backup,schedule,backupstoragelocation`**. Metrics: **`velero`** Service in **`velero`** (Prometheus scrape). See `clusters/noble/bootstrap/velero/README.md`.

View File

@@ -4,5 +4,6 @@ noble_platform_kubectl_request_timeout: 120s
noble_platform_kustomize_retries: 5 noble_platform_kustomize_retries: 5
noble_platform_kustomize_delay: 20 noble_platform_kustomize_delay: 20
# Vault: injector (vault-k8s) owns MutatingWebhookConfiguration.caBundle; Helm upgrade can SSA-conflict. Delete webhook so Helm can recreate it. # Decrypt **clusters/noble/secrets/*.yaml** with SOPS and kubectl apply (requires **sops**, **age**, and **age-key.txt**).
noble_vault_delete_injector_webhook_before_helm: true noble_apply_sops_secrets: true
noble_sops_age_key_file: "{{ noble_repo_root }}/age-key.txt"

View File

@@ -1,6 +1,6 @@
--- ---
# Mirrors former **noble-platform** Argo Application: Helm releases + plain manifests under clusters/noble/bootstrap. # Mirrors former **noble-platform** Argo Application: Helm releases + plain manifests under clusters/noble/bootstrap.
- name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource, Vault extras) - name: Apply clusters/noble/bootstrap kustomize (namespaces, Grafana Loki datasource)
ansible.builtin.command: ansible.builtin.command:
argv: argv:
- kubectl - kubectl
@@ -16,77 +16,26 @@
until: noble_platform_kustomize.rc == 0 until: noble_platform_kustomize.rc == 0
changed_when: true changed_when: true
- name: Install Sealed Secrets - name: Stat SOPS age private key (age-key.txt)
ansible.builtin.command: ansible.builtin.stat:
argv: path: "{{ noble_sops_age_key_file }}"
- helm register: noble_sops_age_key_stat
- upgrade
- --install
- sealed-secrets
- sealed-secrets/sealed-secrets
- --namespace
- sealed-secrets
- --version
- "2.18.4"
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/sealed-secrets/values.yaml"
- --wait
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
changed_when: true
- name: Install External Secrets Operator - name: Apply SOPS-encrypted cluster secrets (clusters/noble/secrets/*.yaml)
ansible.builtin.command: ansible.builtin.shell: |
argv: set -euo pipefail
- helm shopt -s nullglob
- upgrade for f in "{{ noble_repo_root }}/clusters/noble/secrets"/*.yaml; do
- --install sops -d "$f" | kubectl apply -f -
- external-secrets done
- external-secrets/external-secrets args:
- --namespace executable: /bin/bash
- external-secrets
- --version
- "2.2.0"
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/values.yaml"
- --wait
environment: environment:
KUBECONFIG: "{{ noble_kubeconfig }}" KUBECONFIG: "{{ noble_kubeconfig }}"
changed_when: true SOPS_AGE_KEY_FILE: "{{ noble_sops_age_key_file }}"
when:
# vault-k8s patches webhook CA after install; Helm 3/4 SSA then conflicts on upgrade. Removing the MWC lets Helm re-apply cleanly; injector repopulates caBundle. - noble_apply_sops_secrets | default(true) | bool
- name: Delete Vault agent injector MutatingWebhookConfiguration before Helm (avoids caBundle field conflict) - noble_sops_age_key_stat.stat.exists
ansible.builtin.command:
argv:
- kubectl
- delete
- mutatingwebhookconfiguration
- vault-agent-injector-cfg
- --ignore-not-found
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
register: noble_vault_mwc_delete
when: noble_vault_delete_injector_webhook_before_helm | default(true) | bool
changed_when: "'deleted' in (noble_vault_mwc_delete.stdout | default(''))"
- name: Install Vault
ansible.builtin.command:
argv:
- helm
- upgrade
- --install
- vault
- hashicorp/vault
- --namespace
- vault
- --version
- "0.32.0"
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/vault/values.yaml"
- --wait
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
HELM_SERVER_SIDE_APPLY: "false"
changed_when: true changed_when: true
- name: Install kube-prometheus-stack - name: Install kube-prometheus-stack

View File

@@ -1,24 +1,10 @@
--- ---
- name: Vault — manual steps (not automated) - name: SOPS secrets (workstation)
ansible.builtin.debug: ansible.builtin.debug:
msg: | msg: |
1. kubectl -n vault get pods (wait for Running) Encrypted Kubernetes Secrets live under clusters/noble/secrets/ (Mozilla SOPS + age).
2. kubectl -n vault exec -it vault-0 -- vault operator init (once; save keys) Private key: age-key.txt at repo root (gitignored). See clusters/noble/secrets/README.md
3. Unseal per clusters/noble/bootstrap/vault/README.md and .sops.yaml. noble.yml decrypt-applies these when age-key.txt exists.
4. ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
5. kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
- name: Optional — apply Vault ClusterSecretStore for External Secrets
ansible.builtin.command:
argv:
- kubectl
- apply
- -f
- "{{ noble_repo_root }}/clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
environment:
KUBECONFIG: "{{ noble_kubeconfig }}"
when: noble_apply_vault_cluster_secret_store | default(false) | bool
changed_when: true
- name: Argo CD optional root Application (empty app-of-apps) - name: Argo CD optional root Application (empty app-of-apps)
ansible.builtin.debug: ansible.builtin.debug:

BIN
branding/nikflix/logo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

View File

@@ -1,6 +1,6 @@
# Argo CD — optional applications (non-bootstrap) # Argo CD — optional applications (non-bootstrap)
**Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, Vault, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here. **Base cluster configuration** (CNI, MetalLB, ingress, cert-manager, storage, observability stack, policy, SOPS secrets path, etc.) is installed by **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not from here.
**`noble-root`** (`clusters/noble/bootstrap/argocd/root-application.yaml`) points at **`clusters/noble/apps`**. Add **`Application`** manifests (and optional **`AppProject`** definitions) under this directory only for workloads that are additive and do not subsume the Ansible-managed platform. **`noble-root`** (`clusters/noble/bootstrap/argocd/root-application.yaml`) points at **`clusters/noble/apps`**. Add **`Application`** manifests (and optional **`AppProject`** definitions) under this directory only for workloads that are additive and do not subsume the Ansible-managed platform.

View File

@@ -79,12 +79,6 @@ config:
href: https://longhorn.apps.noble.lab.pcenicni.dev href: https://longhorn.apps.noble.lab.pcenicni.dev
siteMonitor: http://longhorn-frontend.longhorn-system.svc.cluster.local:80 siteMonitor: http://longhorn-frontend.longhorn-system.svc.cluster.local:80
description: Storage volumes, nodes, backups description: Storage volumes, nodes, backups
- Vault:
icon: si-vault
href: https://vault.apps.noble.lab.pcenicni.dev
# Unauthenticated health (HEAD/GET) — not the redirecting UI root
siteMonitor: http://vault.vault.svc.cluster.local:8200/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204
description: Secrets engine UI (after init/unseal)
- Velero: - Velero:
icon: mdi-backup-restore icon: mdi-backup-restore
href: https://velero.io/docs/ href: https://velero.io/docs/

View File

@@ -52,7 +52,7 @@ Use **Settings → Repositories** in the UI, or `argocd repo add` / a `Secret` o
## 4. App-of-apps (optional GitOps only) ## 4. App-of-apps (optional GitOps only)
Bootstrap **platform** workloads (CNI, ingress, cert-manager, Kyverno, observability, Vault, etc.) are installed by Bootstrap **platform** workloads (CNI, ingress, cert-manager, Kyverno, observability, etc.) are installed by
**`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not by Argo. **`clusters/noble/apps/kustomization.yaml`** is empty by default. **`ansible/playbooks/noble.yml`** from **`clusters/noble/bootstrap/`** — not by Argo. **`clusters/noble/apps/kustomization.yaml`** is empty by default.
1. Edit **`root-application.yaml`**: set **`repoURL`** and **`targetRevision`** to this repository. The **`resources-finalizer.argocd.argoproj.io/background`** finalizer uses Argos path-qualified form so **`kubectl apply`** does not warn about finalizer names. 1. Edit **`root-application.yaml`**: set **`repoURL`** and **`targetRevision`** to this repository. The **`resources-finalizer.argocd.argoproj.io/background`** finalizer uses Argos path-qualified form so **`kubectl apply`** does not warn about finalizer names.

View File

@@ -1,60 +0,0 @@
# External Secrets Operator (noble)
Syncs secrets from external systems into Kubernetes **Secret** objects via **ExternalSecret** / **ClusterExternalSecret** CRDs.
- **Chart:** `external-secrets/external-secrets` **2.2.0** (app **v2.2.0**)
- **Namespace:** `external-secrets`
- **Helm release name:** `external-secrets` (matches the operator **ServiceAccount** name `external-secrets`)
## Install
```bash
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
--version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
```
Verify:
```bash
kubectl -n external-secrets get deploy,pods
kubectl get crd | grep external-secrets
```
## Vault `ClusterSecretStore` (after Vault is deployed)
The checklist expects a **Vault**-backed store. Install Vault first (`talos/CLUSTER-BUILD.md` Phase E — Vault on Longhorn + auto-unseal), then:
1. Enable **KV v2** secrets engine and **Kubernetes** auth in Vault; create a **role** (e.g. `external-secrets`) that maps the clusters **`external-secrets` / `external-secrets`** service account to a policy that can read the paths you need.
2. Copy **`examples/vault-cluster-secret-store.yaml`**, set **`spec.provider.vault.server`** to your Vault URL. This repos Vault Helm values use **HTTP** on port **8200** (`global.tlsDisable: true`): **`http://vault.vault.svc.cluster.local:8200`**. Use **`https://`** if you enable TLS on the Vault listener.
3. If Vault uses a **private TLS CA**, configure **`caProvider`** or **`caBundle`** on the Vault provider — see [HashiCorp Vault provider](https://external-secrets.io/latest/provider/hashicorp-vault/). Do not commit private CA material to public git unless intended.
4. Apply: **`kubectl apply -f …/vault-cluster-secret-store.yaml`**
5. Confirm the store is ready: **`kubectl describe clustersecretstore vault`**
Example **ExternalSecret** (after the store is healthy):
```yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: demo
namespace: default
spec:
refreshInterval: 1h
secretStoreRef:
name: vault
kind: ClusterSecretStore
target:
name: demo-synced
data:
- secretKey: password
remoteRef:
key: secret/data/myapp
property: password
```
## Upgrades
Pin the chart version in `values.yaml` header comments; run the same **`helm upgrade --install`** with the new **`--version`** after reviewing [release notes](https://github.com/external-secrets/external-secrets/releases).

View File

@@ -1,31 +0,0 @@
# ClusterSecretStore for HashiCorp Vault (KV v2) using Kubernetes auth.
#
# Do not apply until Vault is running, reachable from the cluster, and configured with:
# - Kubernetes auth at mountPath (default: kubernetes)
# - A role (below: external-secrets) bound to this service account:
# name: external-secrets
# namespace: external-secrets
# - A policy allowing read on the KV path used below (e.g. secret/data/* for path "secret")
#
# Adjust server, mountPath, role, and path to match your Vault deployment. If Vault uses TLS
# with a private CA, set provider.vault.caProvider or caBundle (see README).
#
# kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
---
apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
name: vault
spec:
provider:
vault:
server: "http://vault.vault.svc.cluster.local:8200"
path: secret
version: v2
auth:
kubernetes:
mountPath: kubernetes
role: external-secrets
serviceAccountRef:
name: external-secrets
namespace: external-secrets

View File

@@ -1,5 +0,0 @@
# External Secrets Operator — apply before Helm.
apiVersion: v1
kind: Namespace
metadata:
name: external-secrets

View File

@@ -1,10 +0,0 @@
# External Secrets Operator — noble
#
# helm repo add external-secrets https://charts.external-secrets.io
# helm repo update
# kubectl apply -f clusters/noble/bootstrap/external-secrets/namespace.yaml
# helm upgrade --install external-secrets external-secrets/external-secrets -n external-secrets \
# --version 2.2.0 -f clusters/noble/bootstrap/external-secrets/values.yaml --wait
#
# CRDs are installed by the chart (installCRDs: true). Vault ClusterSecretStore: see README + examples/.
commonLabels: {}

View File

@@ -8,13 +8,9 @@ resources:
- kube-prometheus-stack/namespace.yaml - kube-prometheus-stack/namespace.yaml
- loki/namespace.yaml - loki/namespace.yaml
- fluent-bit/namespace.yaml - fluent-bit/namespace.yaml
- sealed-secrets/namespace.yaml - newt/namespace.yaml
- external-secrets/namespace.yaml
- vault/namespace.yaml
- kyverno/namespace.yaml - kyverno/namespace.yaml
- velero/namespace.yaml - velero/namespace.yaml
- velero/longhorn-volumesnapshotclass.yaml - velero/longhorn-volumesnapshotclass.yaml
- headlamp/namespace.yaml - headlamp/namespace.yaml
- grafana-loki-datasource/loki-datasource.yaml - grafana-loki-datasource/loki-datasource.yaml
- vault/unseal-cronjob.yaml
- vault/cilium-network-policy.yaml

View File

@@ -35,7 +35,6 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
- kube-node-lease - kube-node-lease
- argocd - argocd
- cert-manager - cert-manager
- external-secrets
- headlamp - headlamp
- kyverno - kyverno
- logging - logging
@@ -44,9 +43,7 @@ x-kyverno-exclude-infra: &kyverno_exclude_infra
- metallb-system - metallb-system
- monitoring - monitoring
- newt - newt
- sealed-secrets
- traefik - traefik
- vault
policyExclude: policyExclude:
disallow-capabilities: *kyverno_exclude_infra disallow-capabilities: *kyverno_exclude_infra

View File

@@ -2,26 +2,24 @@
This is the **primary** automation path for **public** hostnames to workloads in this cluster (it **replaces** in-cluster ExternalDNS). [Newt](https://github.com/fosrl/newt) is the on-prem agent that connects your cluster to a **Pangolin** site (WireGuard tunnel). The [Fossorial Helm chart](https://github.com/fosrl/helm-charts) deploys one or more instances. This is the **primary** automation path for **public** hostnames to workloads in this cluster (it **replaces** in-cluster ExternalDNS). [Newt](https://github.com/fosrl/newt) is the on-prem agent that connects your cluster to a **Pangolin** site (WireGuard tunnel). The [Fossorial Helm chart](https://github.com/fosrl/helm-charts) deploys one or more instances.
**Secrets:** Never commit endpoint, Newt ID, or Newt secret. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret. **Secrets:** Never commit endpoint, Newt ID, or Newt secret in **plain** YAML. If credentials were pasted into chat or CI logs, **rotate them** in Pangolin and recreate the Kubernetes Secret.
## 1. Create the Secret ## 1. Create the Secret
Keys must match `values.yaml` (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`). Keys must match `values.yaml` (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`).
### Option A — Sealed Secret (safe for GitOps) ### Option A — SOPS (safe for GitOps)
With the [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) controller installed (`clusters/noble/bootstrap/sealed-secrets/`), generate a `SealedSecret` from your workstation (rotate credentials in Pangolin first if they were exposed): Encrypt a normal **`Secret`** with [Mozilla SOPS](https://github.com/getsops/sops) and **age** (see **`clusters/noble/secrets/README.md`** and **`.sops.yaml`**). The repo includes an encrypted example at **`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`** — edit with `sops` after exporting **`SOPS_AGE_KEY_FILE`** to your **`age-key.txt`**, or create a new file and encrypt it.
```bash ```bash
chmod +x clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
export PANGOLIN_ENDPOINT='https://pangolin.pcenicni.dev' sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
export NEWT_ID='YOUR_NEWT_ID' # then:
export NEWT_SECRET='YOUR_NEWT_SECRET' sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
./clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
``` ```
Commit only the `.sealedsecret.yaml` file, not plain `Secret` YAML. **Ansible** (`noble.yml`) applies all **`clusters/noble/secrets/*.yaml`** automatically when **`age-key.txt`** exists at the repo root.
### Option B — Imperative Secret (not in git) ### Option B — Imperative Secret (not in git)

View File

@@ -1,50 +0,0 @@
# Sealed Secrets (noble)
Encrypts `Secret` manifests so they can live in git; the controller decrypts **SealedSecret** resources into **Secret**s in-cluster.
- **Chart:** `sealed-secrets/sealed-secrets` **2.18.4** (app **0.36.1**)
- **Namespace:** `sealed-secrets`
## Install
```bash
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm repo update
kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
--version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
```
## Workstation: `kubeseal`
Install a **kubeseal** build compatible with the controller (match **app** minor, e.g. **0.36.x** for **0.36.1**). Examples:
- **Homebrew:** `brew install kubeseal` (check `kubeseal --version` against the charts `image.tag` in `helm show values`).
- **GitHub releases:** [bitnami-labs/sealed-secrets](https://github.com/bitnami-labs/sealed-secrets/releases)
Fetch the clusters public seal cert (once per kube context):
```bash
kubeseal --fetch-cert > /tmp/noble-sealed-secrets.pem
```
Create a sealed secret from a normal secret manifest:
```bash
kubectl create secret generic example --from-literal=foo=bar --dry-run=client -o yaml \
| kubeseal --cert /tmp/noble-sealed-secrets.pem -o yaml > example-sealedsecret.yaml
```
Commit `example-sealedsecret.yaml`; apply it with `kubectl apply -f`. The controller creates the **Secret** in the same namespace as the **SealedSecret**.
**Noble example:** `examples/kubeseal-newt-pangolin-auth.sh` (Newt / Pangolin tunnel credentials).
## Backup the sealing key
If the controllers private key is lost, existing sealed files cannot be decrypted on a new cluster. Back up the key secret after install:
```bash
kubectl get secret -n sealed-secrets -l sealedsecrets.bitnami.com/sealed-secrets-key=active -o yaml > sealed-secrets-key-backup.yaml
```
Store `sealed-secrets-key-backup.yaml` in a safe offline location (not in public git).

View File

@@ -1,19 +0,0 @@
#!/usr/bin/env bash
# Emit a SealedSecret for newt-pangolin-auth (namespace newt).
# Prerequisites: sealed-secrets controller running; kubeseal client (same minor as controller).
# Rotate Pangolin/Newt credentials in the UI first if they were exposed, then set env vars and run:
#
# export PANGOLIN_ENDPOINT='https://pangolin.example.com'
# export NEWT_ID='...'
# export NEWT_SECRET='...'
# ./kubeseal-newt-pangolin-auth.sh > newt-pangolin-auth.sealedsecret.yaml
# kubectl apply -f newt-pangolin-auth.sealedsecret.yaml
#
set -euo pipefail
kubectl apply -f "$(dirname "$0")/../../newt/namespace.yaml" >/dev/null 2>&1 || true
kubectl -n newt create secret generic newt-pangolin-auth \
--dry-run=client \
--from-literal=PANGOLIN_ENDPOINT="${PANGOLIN_ENDPOINT:?}" \
--from-literal=NEWT_ID="${NEWT_ID:?}" \
--from-literal=NEWT_SECRET="${NEWT_SECRET:?}" \
-o yaml | kubeseal -o yaml

View File

@@ -1,5 +0,0 @@
# Sealed Secrets controller — apply before Helm.
apiVersion: v1
kind: Namespace
metadata:
name: sealed-secrets

View File

@@ -1,18 +0,0 @@
# Sealed Secrets — noble (Git-encrypted Secret workflow)
#
# helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
# helm repo update
# kubectl apply -f clusters/noble/bootstrap/sealed-secrets/namespace.yaml
# helm upgrade --install sealed-secrets sealed-secrets/sealed-secrets -n sealed-secrets \
# --version 2.18.4 -f clusters/noble/bootstrap/sealed-secrets/values.yaml --wait
#
# Client: install kubeseal (same minor as controller — see README).
# Defaults are sufficient for the lab; override here if you need key renewal, resources, etc.
#
# GitOps pattern: create Secrets only via SealedSecret (or External Secrets + Vault).
# Example (Newt): clusters/noble/bootstrap/sealed-secrets/examples/kubeseal-newt-pangolin-auth.sh
# Backup the controller's sealing key: kubectl -n sealed-secrets get secret sealed-secrets-key -o yaml
#
# Talos cluster secrets (bootstrap token, cluster secret, certs) belong in talhelper talsecret /
# SOPS — not Sealed Secrets. See talos/README.md.
commonLabels: {}

View File

@@ -1,162 +0,0 @@
# HashiCorp Vault (noble)
Standalone Vault with **file** storage on a **Longhorn** PVC (`server.dataStorage`). The listener uses **HTTP** (`global.tlsDisable: true`) for in-cluster use; add TLS at the listener when exposing outside the cluster.
- **Chart:** `hashicorp/vault` **0.32.0** (Vault **1.21.2**)
- **Namespace:** `vault`
## Install
```bash
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
helm upgrade --install vault hashicorp/vault -n vault \
--version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
```
Verify:
```bash
kubectl -n vault get pods,pvc,svc
kubectl -n vault exec -i sts/vault -- vault status
```
## Cilium network policy (Phase G)
After **Cilium** is up, optionally restrict HTTP access to the Vault server pods (**TCP 8200**) to **`external-secrets`** and same-namespace clients:
```bash
kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
```
If you add workloads in other namespaces that call Vault, extend **`ingress`** in that manifest.
## Initialize and unseal (first time)
From a workstation with `kubectl` (or `kubectl exec` into any pod with `vault` CLI):
```bash
kubectl -n vault exec -i sts/vault -- vault operator init -key-shares=1 -key-threshold=1
```
**Lab-only:** `-key-shares=1 -key-threshold=1` keeps a single unseal key. For stronger Shamir splits, use more shares and store them safely.
Save the **Unseal Key** and **Root Token** offline. Then unseal once:
```bash
kubectl -n vault exec -i sts/vault -- vault operator unseal
# paste unseal key
```
Or create the Secret used by the optional CronJob and apply it:
```bash
kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
```
The CronJob runs every minute and unseals if Vault is sealed and the Secret is present.
## Auto-unseal note
Vault **OSS** auto-unseal uses cloud KMS (AWS, GCP, Azure, OCI), **Transit** (another Vault), etc. There is no first-class “Kubernetes Secret” seal. This repo uses an optional **CronJob** as a **lab** substitute. Production clusters should use a supported seal backend.
## Kubernetes auth (External Secrets / ClusterSecretStore)
**One-shot:** from the repo root, `export KUBECONFIG=talos/kubeconfig` and `export VAULT_TOKEN=…`, then run **`./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`** (idempotent). Then **`kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** on its own line (shell comments **`# …`** on the same line are parsed as extra `kubectl` args and break `apply`). **`kubectl get clustersecretstore vault`** should show **READY=True** after a few seconds.
Run these **from your workstation** (needs `kubectl`; no local `vault` binary required). Use a **short-lived admin token** or the root token **only in your shell** — do not paste tokens into logs or chat.
**1. Enable the auth method** (skip if already done):
```bash
kubectl -n vault exec -it sts/vault -- sh -c '
export VAULT_ADDR=http://127.0.0.1:8200
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
vault auth enable kubernetes
'
```
**2. Configure `auth/kubernetes`** — the API **issuer** must match the `iss` claim on service account JWTs. With **kube-vip** / a custom API URL, discover it from the cluster (do not assume `kubernetes.default`):
```bash
ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
```
Then apply config **inside** the Vault pod (environment variables are passed in with `env` so quoting stays correct):
```bash
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
export ISSUER REVIEWER CA_B64
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
CA_B64="$CA_B64" \
REVIEWER="$REVIEWER" \
ISSUER="$ISSUER" \
sh -ec '
echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
vault write auth/kubernetes/config \
kubernetes_host="https://kubernetes.default.svc:443" \
kubernetes_ca_cert=@/tmp/k8s-ca.crt \
token_reviewer_jwt="$REVIEWER" \
issuer="$ISSUER"
'
```
**3. KV v2** at path `secret` (skip if already enabled):
```bash
kubectl -n vault exec -it sts/vault -- sh -c '
export VAULT_ADDR=http://127.0.0.1:8200
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
vault secrets enable -path=secret kv-v2
'
```
**4. Policy + role** for the External Secrets operator SA (`external-secrets` / `external-secrets`):
```bash
kubectl -n vault exec -it sts/vault -- sh -c '
export VAULT_ADDR=http://127.0.0.1:8200
export VAULT_TOKEN="YOUR_ROOT_OR_ADMIN_TOKEN"
vault policy write external-secrets - <<EOF
path "secret/data/*" {
capabilities = ["read", "list"]
}
path "secret/metadata/*" {
capabilities = ["read", "list"]
}
EOF
vault write auth/kubernetes/role/external-secrets \
bound_service_account_names=external-secrets \
bound_service_account_namespaces=external-secrets \
policies=external-secrets \
ttl=24h
'
```
**5. Apply** **`clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml`** if you have not already, then verify:
```bash
kubectl describe clustersecretstore vault
```
See also [Kubernetes auth](https://developer.hashicorp.com/vault/docs/auth/kubernetes#configuration).
## TLS and External Secrets
`values.yaml` disables TLS on the Vault listener. The **`ClusterSecretStore`** example uses **`http://vault.vault.svc.cluster.local:8200`**. If you enable TLS on the listener, switch the URL to **`https://`** and configure **`caBundle`** or **`caProvider`** on the store.
## UI
Port-forward:
```bash
kubectl -n vault port-forward svc/vault-ui 8200:8200
```
Open `http://127.0.0.1:8200` and log in with the root token (rotate for production workflows).

View File

@@ -1,40 +0,0 @@
# CiliumNetworkPolicy — restrict who may reach Vault HTTP listener (8200).
# Apply after Cilium is healthy: kubectl apply -f clusters/noble/bootstrap/vault/cilium-network-policy.yaml
#
# Ingress-only policy: egress from Vault is unchanged (Kubernetes auth needs API + DNS).
# Extend ingress rules if other namespaces must call Vault (e.g. app workloads).
#
# Ref: https://docs.cilium.io/en/stable/security/policy/language/
---
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: vault-http-ingress
namespace: vault
spec:
endpointSelector:
matchLabels:
app.kubernetes.io/name: vault
component: server
ingress:
- fromEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": external-secrets
toPorts:
- ports:
- port: "8200"
protocol: TCP
- fromEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": traefik
toPorts:
- ports:
- port: "8200"
protocol: TCP
- fromEndpoints:
- matchLabels:
"k8s:io.kubernetes.pod.namespace": vault
toPorts:
- ports:
- port: "8200"
protocol: TCP

View File

@@ -1,77 +0,0 @@
#!/usr/bin/env bash
# Configure Vault Kubernetes auth + KV v2 + policy/role for External Secrets Operator.
# Requires: kubectl (cluster access), jq optional (openid issuer); Vault reachable via sts/vault.
#
# Usage (from repo root):
# export KUBECONFIG=talos/kubeconfig # or your path
# export VAULT_TOKEN='…' # root or admin token — never commit
# ./clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh
#
# Then: kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml
# Verify: kubectl describe clustersecretstore vault
set -euo pipefail
: "${VAULT_TOKEN:?Set VAULT_TOKEN to your Vault root or admin token}"
ISSUER=$(kubectl get --raw /.well-known/openid-configuration | jq -r .issuer)
REVIEWER=$(kubectl -n vault create token vault --duration=8760h)
CA_B64=$(kubectl config view --raw --minify -o jsonpath='{.clusters[0].cluster.certificate-authority-data}')
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
sh -ec '
set -e
vault auth list >/tmp/vauth.txt
grep -q "^kubernetes/" /tmp/vauth.txt || vault auth enable kubernetes
'
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
CA_B64="$CA_B64" \
REVIEWER="$REVIEWER" \
ISSUER="$ISSUER" \
sh -ec '
echo "$CA_B64" | base64 -d > /tmp/k8s-ca.crt
vault write auth/kubernetes/config \
kubernetes_host="https://kubernetes.default.svc:443" \
kubernetes_ca_cert=@/tmp/k8s-ca.crt \
token_reviewer_jwt="$REVIEWER" \
issuer="$ISSUER"
'
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
sh -ec '
set -e
vault secrets list >/tmp/vsec.txt
grep -q "^secret/" /tmp/vsec.txt || vault secrets enable -path=secret kv-v2
'
kubectl -n vault exec -i sts/vault -- env \
VAULT_ADDR=http://127.0.0.1:8200 \
VAULT_TOKEN="$VAULT_TOKEN" \
sh -ec '
vault policy write external-secrets - <<EOF
path "secret/data/*" {
capabilities = ["read", "list"]
}
path "secret/metadata/*" {
capabilities = ["read", "list"]
}
EOF
vault write auth/kubernetes/role/external-secrets \
bound_service_account_names=external-secrets \
bound_service_account_namespaces=external-secrets \
policies=external-secrets \
ttl=24h
'
echo "Done. Issuer used: $ISSUER"
echo ""
echo "Next (each command on its own line — do not paste # comments after kubectl):"
echo " kubectl apply -f clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml"
echo " kubectl get clustersecretstore vault"

View File

@@ -1,5 +0,0 @@
# HashiCorp Vault — apply before Helm.
apiVersion: v1
kind: Namespace
metadata:
name: vault

View File

@@ -1,63 +0,0 @@
# Optional lab auto-unseal: applies after Vault is initialized and Secret `vault-unseal-key` exists.
#
# 1) vault operator init -key-shares=1 -key-threshold=1 (lab only — single key)
# 2) kubectl -n vault create secret generic vault-unseal-key --from-literal=key='YOUR_UNSEAL_KEY'
# 3) kubectl apply -f clusters/noble/bootstrap/vault/unseal-cronjob.yaml
#
# OSS Vault has no Kubernetes/KMS seal; this CronJob runs vault operator unseal when the server is sealed.
# Protect the Secret with RBAC; prefer cloud KMS auto-unseal for real environments.
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: vault-auto-unseal
namespace: vault
spec:
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 3
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
securityContext:
runAsNonRoot: true
runAsUser: 100
runAsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: unseal
image: hashicorp/vault:1.21.2
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
env:
- name: VAULT_ADDR
value: http://vault.vault.svc:8200
command:
- /bin/sh
- -ec
- |
test -f /secrets/key || exit 0
status="$(vault status -format=json 2>/dev/null || true)"
echo "$status" | grep -q '"initialized":true' || exit 0
echo "$status" | grep -q '"sealed":false' && exit 0
vault operator unseal "$(cat /secrets/key)"
volumeMounts:
- name: unseal
mountPath: /secrets
readOnly: true
volumes:
- name: unseal
secret:
secretName: vault-unseal-key
optional: true
items:
- key: key
path: key

View File

@@ -1,62 +0,0 @@
# HashiCorp Vault — noble (standalone, file storage on Longhorn; TLS disabled on listener for in-cluster HTTP).
#
# helm repo add hashicorp https://helm.releases.hashicorp.com
# helm repo update
# kubectl apply -f clusters/noble/bootstrap/vault/namespace.yaml
# helm upgrade --install vault hashicorp/vault -n vault \
# --version 0.32.0 -f clusters/noble/bootstrap/vault/values.yaml --wait --timeout 15m
#
# Post-install: initialize, store unseal key in Secret, apply optional unseal CronJob — see README.md
#
global:
tlsDisable: true
injector:
enabled: true
server:
enabled: true
dataStorage:
enabled: true
size: 10Gi
storageClass: longhorn
accessMode: ReadWriteOnce
ha:
enabled: false
standalone:
enabled: true
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
storage "file" {
path = "/vault/data"
}
# Allow pod Ready before init/unseal so Helm --wait succeeds (see Vault /v1/sys/health docs).
readinessProbe:
enabled: true
path: "/v1/sys/health?uninitcode=204&sealedcode=204&standbyok=true"
port: 8200
# LAN: TLS terminates at Traefik + cert-manager; listener stays HTTP (global.tlsDisable).
ingress:
enabled: true
ingressClassName: traefik
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: vault.apps.noble.lab.pcenicni.dev
paths: []
tls:
- secretName: vault-apps-noble-tls
hosts:
- vault.apps.noble.lab.pcenicni.dev
ui:
enabled: true

View File

@@ -0,0 +1,38 @@
# SOPS-encrypted cluster secrets (noble)
Secrets that belong in git are stored here as **Mozilla SOPS** files encrypted with [age](https://github.com/FiloSottile/age). The matching **private** key lives in **`age-key.txt`** at the repository root (gitignored — create with `age-keygen -o age-key.txt` and add the public key to **`.sops.yaml`** if you rotate keys).
**Migrating from an older cluster** that ran **Vault**, **Sealed Secrets**, or **External Secrets Operator:** uninstall those Helm releases (`helm uninstall vault -n vault`, etc.), delete their namespaces if empty, and export any secrets you still need into plain **`Secret`** YAML here, then encrypt with **`sops`** before committing.
## Prerequisites
- [sops](https://github.com/getsops/sops) and **age** on the machine that encrypts or applies secrets.
## Edit or create a Secret
```bash
export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
# Create a new file from a template, then encrypt:
sops clusters/noble/secrets/example.secret.yaml
# Or edit an existing encrypted file (opens decrypted in $EDITOR):
sops clusters/noble/secrets/newt-pangolin-auth.secret.yaml
```
## Apply to the cluster
```bash
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt
sops -d clusters/noble/secrets/newt-pangolin-auth.secret.yaml | kubectl apply -f -
```
**Ansible** (`noble.yml`) runs the same decrypt-and-apply step for every `*.yaml` in this directory when **`age-key.txt`** exists and **`noble_apply_sops_secrets`** is true (see `ansible/group_vars/all.yml`).
## Files
| File | Purpose |
|------|---------|
| `newt-pangolin-auth.secret.yaml` | Pangolin tunnel credentials for [Newt](../../bootstrap/newt/README.md) (`PANGOLIN_ENDPOINT`, `NEWT_ID`, `NEWT_SECRET`). Replace placeholders and re-encrypt before relying on them. |

View File

@@ -0,0 +1,30 @@
apiVersion: ENC[AES256_GCM,data:FaA=,iv:EsqIdZmNS4hfzwCZ0gL7Q5Czaz8Bii3jWFu60lKmgVo=,tag:tfr4yUuTiH4s+ufYW/dpCA==,type:str]
kind: ENC[AES256_GCM,data:ozpTcG9F,iv:Q1EZ896Plhyz2qM4JJRnBf940kbVLSwyIIPUcDGBZFA=,tag:1bWEgI+I4Ni5J70MlohYdA==,type:str]
metadata:
name: ENC[AES256_GCM,data:moXbGuT6ZOGhgVUBNcpHeLZQ,iv:1WDtxT/Et/6lxx1Mj93CQME8o0lhzxnBMkdSqP/n3R0=,tag:v+iqfE8tzCx8ZOMUW7OyEA==,type:str]
namespace: ENC[AES256_GCM,data:33/AMg==,iv:M0GvB/70nHh4MVR1saZy1pGY8IFFzkzGdJl4szHJbCI=,tag:0+1LX/EnkAP0FZ6ARKZNAA==,type:str]
type: ENC[AES256_GCM,data:3io5utU1,iv:QqMDNL/R8SR7TC9mwDdDd3V6VOo+csgeiZCr2AdOZjw=,tag:/KSMy+vNz7Qj/I463eG0LQ==,type:str]
stringData:
PANGOLIN_ENDPOINT: ENC[AES256_GCM,data:a/2QTnGYnNXGxNm8QSVTKC6I+r88J1m1CdMmTA==,iv:L2LvLD7IRX8wdAzALAWQ2ojB2OjWDIcVKrdi/lSvZFY=,tag:ALffRF9bncxA8CExSaRmHA==,type:str]
NEWT_ID: ENC[AES256_GCM,data:Xfe8QvBdX62CciYXYwMfJAzIE/0=,iv:tA+FJ93tsjJ29L3bSxNAEooiKPMc+5pa00EpQ2cJkho=,tag:auiR/zQjnsmyllXbSJf3KA==,type:str]
NEWT_SECRET: ENC[AES256_GCM,data:XY8XZOkZ+GpnjljbvtaH2oGJpDoZ47fN,iv:+J5sb7saqbVwHEyemx3CUSsdKArubRdPCLGbT09sFLM=,tag:zUowv8I1CaWZH+KLYOwKYw==,type:str]
sops:
kms: []
gcp_kms: []
azure_kv: []
hc_vault: []
age:
- recipient: age1juym5p3ez3dkt0dxlznydgfgqvaujfnyk9hpdsssf50hsxeh3p4sjpf3gn
enc: |
-----BEGIN AGE ENCRYPTED FILE-----
YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSB0RWppdWxZUEYzc2I2TURi
dm1pUzVaNDA4YldsWkFJODl1MWZ6MXFxWnhjCnVtU1VEQnJqbTI5M0hWM2FCaVlS
aXprTm42bTlldUVHMmxpUUJiWEVhcXcKLS0tIGNLVnNtNDdMQ0VVeDV1N29nOW9F
clhLa2tPdWtRMWYzc2YrR0hSQXczTlUK6hYj4HxQvu6Kqn/Ki+cYv9x5nvolyGqQ
N4g9z+t6orT6MYseWPf0uyovC/5iOOC6z/2exVe7/0rYo7ZOFm6dYQ==
-----END AGE ENCRYPTED FILE-----
lastmodified: "2026-03-29T23:37:33Z"
mac: ENC[AES256_GCM,data:uKtdqJhwE4HLCenHH+RG8O2yfVIcGbiXznL9ouAXhDLnQh/ksgeczr2fyyn9hs/JhCozAqRrF8vnYZsIdfG1DQfHjXn6Ro6gzYC0YR+gvFU8Mz9uPdVX3HYjUrzKJ5GhhBami0USZtLdGKOGgFDYmFoDsD/PmMXLUol8qJdW8Uk=,iv:rIfQI17+3vNBB1n//D7Wnl/SLWFjV0pgZDteumlS2f8=,tag:xibCfJdZQS+aB75drmY1VA==,type:str]
pgp: []
unencrypted_suffix: _unencrypted
version: 3.9.3

169
docs/Racks.md Normal file
View File

@@ -0,0 +1,169 @@
# Physical racks — Noble lab (10")
This page is a **logical rack layout** for the **noble** Talos lab: **three 10" (half-width) racks**, how **rack units (U)** are used, and **Ethernet** paths on **`192.168.50.0/24`**. Node names and IPs match [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) and [`docs/architecture.md`](architecture.md).
## Legend
| Symbol | Meaning |
|--------|---------|
| `█` / filled cell | Equipment occupying that **1U** |
| `░` | Reserved / future use |
| `·` | Empty |
| `━━` | Copper to LAN switch |
**Rack unit numbering:** **U increases upward** (U1 = bottom of rack, like ANSI/EIA). **Slot** in the diagrams is **top → bottom** reading order for a quick visual scan.
### Three racks at a glance
Read **top → bottom** (first row = top of rack).
| Primary (10") | Storage B (10") | Rack C (10") |
|-----------------|-----------------|--------------|
| Fiber ONT | Mac Mini | *empty* |
| UniFi Fiber Gateway | NAS | *empty* |
| Patch panel | JBOD | *empty* |
| 2.5 GbE ×8 PoE switch | *empty* | *empty* |
| Raspberry Pi cluster | *empty* | *empty* |
| **helium** (Talos) | *empty* | *empty* |
| **neon** (Talos) | *empty* | *empty* |
| **argon** (Talos) | *empty* | *empty* |
| **krypton** (Talos) | *empty* | *empty* |
**Connectivity:** Primary rack gear shares **one L2** (`192.168.50.0/24`). Storage B and Rack C link the same way when cabled (e.g. **Ethernet** to the PoE switch, **VPN** or flat LAN per your design).
---
## Rack A — LAN aggregation (10" × 12U)
Dedicated to **Layer-2 access** and cable home runs. All cluster nodes plug into this switch (or into a downstream switch that uplinks here).
```
TOP OF RACK
┌────────────────────────────────────────┐
│ Slot 1 ········· empty ·············· │ 12U
│ Slot 2 ········· empty ·············· │ 11U
│ Slot 3 ········· empty ·············· │ 10U
│ Slot 4 ········· empty ·············· │ 9U
│ Slot 5 ········· empty ·············· │ 8U
│ Slot 6 ········· empty ·············· │ 7U
│ Slot 7 ░░░░░░░ optional PDU ░░░░░░░░ │ 6U
│ Slot 8 █████ 1U cable manager ██████ │ 5U
│ Slot 9 █████ 1U patch panel █████████ │ 4U
│ Slot10 ███ 8-port managed switch ████ │ 3U ← LAN L2 spine
│ Slot11 ········· empty ·············· │ 2U
│ Slot12 ········· empty ·············· │ 1U
└────────────────────────────────────────┘
BOTTOM
```
**Network role:** Every node NIC → **switch access port** → same **VLAN / flat LAN** as documented; **kube-vip** VIP **`192.168.50.230`**, **MetalLB** **`192.168.50.210``229`**, **Traefik** **`192.168.50.211`** are **logical** on node IPs (no extra hardware).
---
## Rack B — Control planes (10" × 12U)
Three **Talos control-plane** nodes (**scheduling allowed** on CPs per `talconfig.yaml`).
```
TOP OF RACK
┌────────────────────────────────────────┐
│ Slot 1 ········· empty ·············· │ 12U
│ Slot 2 ········· empty ·············· │ 11U
│ Slot 3 ········· empty ·············· │ 10U
│ Slot 4 ········· empty ·············· │ 9U
│ Slot 5 ········· empty ·············· │ 8U
│ Slot 6 ········· empty ·············· │ 7U
│ Slot 7 ········· empty ·············· │ 6U
│ Slot 8 █ neon control-plane .20 ████ │ 5U
│ Slot 9 █ argon control-plane .30 ███ │ 4U
│ Slot10 █ krypton control-plane .40 ██ │ 3U (kube-vip VIP .230)
│ Slot11 ········· empty ·············· │ 2U
│ Slot12 ········· empty ·············· │ 1U
└────────────────────────────────────────┘
BOTTOM
```
---
## Rack C — Worker (10" × 12U)
Single **worker** node; **Longhorn** data disk is **local** to each node (see `talconfig.yaml`); no separate NAS in this diagram.
```
TOP OF RACK
┌────────────────────────────────────────┐
│ Slot 1 ········· empty ·············· │ 12U
│ Slot 2 ········· empty ·············· │ 11U
│ Slot 3 ········· empty ·············· │ 10U
│ Slot 4 ········· empty ·············· │ 9U
│ Slot 5 ········· empty ·············· │ 8U
│ Slot 6 ········· empty ·············· │ 7U
│ Slot 7 ░░░░░░░ spare / future ░░░░░░░░ │ 6U
│ Slot 8 ········· empty ·············· │ 5U
│ Slot 9 ········· empty ·············· │ 4U
│ Slot10 ███ helium worker .10 █████ │ 3U
│ Slot11 ········· empty ·············· │ 2U
│ Slot12 ········· empty ·············· │ 1U
└────────────────────────────────────────┘
BOTTOM
```
---
## Space summary
| System | Rack | Approx. U | IP | Role |
|--------|------|-----------|-----|------|
| LAN switch | A | 1U | — | All nodes on `192.168.50.0/24` |
| Patch / cable mgmt | A | 2× 1U | — | Physical plant |
| **neon** | B | 1U | `192.168.50.20` | control-plane + schedulable |
| **argon** | B | 1U | `192.168.50.30` | control-plane + schedulable |
| **krypton** | B | 1U | `192.168.50.40` | control-plane + schedulable |
| **helium** | C | 1U | `192.168.50.10` | worker |
Adjust **empty vs. future** rows if your chassis are **2U** or on **shelves** — scale the `█` blocks accordingly.
---
## Network connections
All cluster nodes are on **one flat LAN**. **kube-vip** floats **`192.168.50.230:6443`** across the three control-plane hosts on **`ens18`** (see cluster bootstrap docs).
```mermaid
flowchart TB
subgraph RACK_A["Rack A — 10\""]
SW["Managed switch<br/>192.168.50.0/24 L2"]
PP["Patch / cable mgmt"]
SW --- PP
end
subgraph RACK_B["Rack B — 10\""]
N["neon :20"]
A["argon :30"]
K["krypton :40"]
end
subgraph RACK_C["Rack C — 10\""]
H["helium :10"]
end
subgraph LOGICAL["Logical (any node holding VIP)"]
VIP["API VIP 192.168.50.230<br/>kube-vip → apiserver :6443"]
end
WAN["Internet / other LANs"] -.->|"router (out of scope)"| SW
SW <-->|"Ethernet"| N
SW <-->|"Ethernet"| A
SW <-->|"Ethernet"| K
SW <-->|"Ethernet"| H
N --- VIP
A --- VIP
K --- VIP
WK["Workstation / CI<br/>kubectl, browser"] -->|"HTTPS :6443"| VIP
WK -->|"L2 (MetalLB .210.211, any node)"| SW
```
**Ingress path (same LAN):** clients → **`192.168.50.211`** (Traefik) or **`192.168.50.210`** (Argo CD) via **MetalLB** — still **through the same switch** to whichever node advertises the service.
---
## Related docs
- Cluster topology and services: [`architecture.md`](architecture.md)
- Build state and versions: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)

View File

@@ -8,8 +8,8 @@ This document describes the **noble** Talos lab cluster: node topology, networki
|---------------|---------| |---------------|---------|
| **Subgraph “Cluster”** | Kubernetes cluster boundary (`noble`) | | **Subgraph “Cluster”** | Kubernetes cluster boundary (`noble`) |
| **External / DNS / cloud** | Services outside the data plane (internet, registrar, Pangolin) | | **External / DNS / cloud** | Services outside the data plane (internet, registrar, Pangolin) |
| **Data store** | Durable data (etcd, Longhorn, Loki, Vault storage) | | **Data store** | Durable data (etcd, Longhorn, Loki) |
| **Secrets / policy** | Secret material, Vault, admission policy | | **Secrets / policy** | Secret material (SOPS in git), admission policy |
| **LB / VIP** | Load balancer, MetalLB assignment, or API VIP | | **LB / VIP** | Load balancer, MetalLB assignment, or API VIP |
--- ---
@@ -74,7 +74,7 @@ flowchart TB
## Platform stack (bootstrap → workloads) ## Platform stack (bootstrap → workloads)
Order: **Talos****Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip****Traefik**, **cert-manager****Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `vault`, `external-secrets`, `sealed-secrets`, `kyverno`, `newt`, and others as deployed. Order: **Talos****Cilium** (cluster uses `cni: none` until CNI is installed) → **metrics-server**, **Longhorn**, **MetalLB** + pool manifests, **kube-vip****Traefik**, **cert-manager****Argo CD** (Helm only; optional empty app-of-apps). **Automated install:** `ansible/playbooks/noble.yml` (see `ansible/README.md`). Platform namespaces include `cert-manager`, `traefik`, `metallb-system`, `longhorn-system`, `monitoring`, `loki`, `logging`, `argocd`, `kyverno`, `newt`, and others as deployed.
```mermaid ```mermaid
flowchart TB flowchart TB
@@ -98,7 +98,7 @@ flowchart TB
Argo["Argo CD<br/>(optional app-of-apps; platform via Ansible)"] Argo["Argo CD<br/>(optional app-of-apps; platform via Ansible)"]
end end
subgraph L5["Platform namespaces (examples)"] subgraph L5["Platform namespaces (examples)"]
NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, vault, external-secrets, sealed-secrets,<br/>kyverno, newt, …"] NS["cert-manager, traefik, metallb-system,<br/>longhorn-system, monitoring, loki, logging,<br/>argocd, kyverno, newt, …"]
end end
Talos --> Cilium --> MS Talos --> Cilium --> MS
Cilium --> LH Cilium --> LH
@@ -149,22 +149,20 @@ flowchart LR
## Secrets and policy ## Secrets and policy
**Sealed Secrets** decrypts `SealedSecret` objects in-cluster. **External Secrets Operator** syncs from **Vault** using **`ClusterSecretStore`** (see [`examples/vault-cluster-secret-store.yaml`](../clusters/noble/bootstrap/external-secrets/examples/vault-cluster-secret-store.yaml)). Trust is **cluster → Vault** (ESO calls Vault; Vault does not initiate cluster trust). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**. **Mozilla SOPS** with **age** encrypts plain Kubernetes **`Secret`** manifests under [`clusters/noble/secrets/`](../clusters/noble/secrets/); operators decrypt at apply time (`ansible/playbooks/noble.yml` or `sops -d … | kubectl apply`). The private key is **`age-key.txt`** at the repo root (gitignored). **Kyverno** with **kyverno-policies** enforces **PSS baseline** in **Audit**.
```mermaid ```mermaid
flowchart LR flowchart LR
subgraph Git["Git repo"] subgraph Git["Git repo"]
SSman["SealedSecret manifests<br/>(optional)"] SM["SOPS-encrypted Secret YAML<br/>clusters/noble/secrets/"]
end
subgraph ops["Apply path"]
SOPS["sops -d + kubectl apply<br/>(or Ansible noble.yml)"]
end end
subgraph cluster["Cluster"] subgraph cluster["Cluster"]
SSC["Sealed Secrets controller<br/>sealed-secrets"]
ESO["External Secrets Operator<br/>external-secrets"]
V["Vault<br/>vault namespace<br/>HTTP listener"]
K["Kyverno + kyverno-policies<br/>PSS baseline Audit"] K["Kyverno + kyverno-policies<br/>PSS baseline Audit"]
end end
SSman -->|"encrypted"| SSC -->|"decrypt to Secret"| workloads["Workload Secrets"] SM --> SOPS -->|"plain Secret"| workloads["Workload Secrets"]
ESO -->|"ClusterSecretStore →"| V
ESO -->|"sync ExternalSecret"| workloads
K -.->|"admission / audit<br/>(PSS baseline)"| workloads K -.->|"admission / audit<br/>(PSS baseline)"| workloads
``` ```
@@ -172,7 +170,7 @@ flowchart LR
## Data and storage ## Data and storage
**StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **Vault**, **kube-prometheus-stack** PVCs, and **Loki**. **StorageClass:** **`longhorn`** (default). Talos mounts **user volume** data at **`/var/mnt/longhorn`** (bind paths for Longhorn). Stateful consumers include **kube-prometheus-stack** PVCs and **Loki**.
```mermaid ```mermaid
flowchart TB flowchart TB
@@ -183,12 +181,10 @@ flowchart TB
SC["StorageClass: longhorn (default)"] SC["StorageClass: longhorn (default)"]
end end
subgraph consumers["Stateful / durable consumers"] subgraph consumers["Stateful / durable consumers"]
V["Vault PVC data-vault-0"]
PGL["kube-prometheus-stack PVCs"] PGL["kube-prometheus-stack PVCs"]
L["Loki PVC"] L["Loki PVC"]
end end
UD --> SC UD --> SC
SC --> V
SC --> PGL SC --> PGL
SC --> L SC --> L
``` ```
@@ -210,7 +206,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
| Argo CD | 9.4.17 / app v3.3.6 | | Argo CD | 9.4.17 / app v3.3.6 |
| kube-prometheus-stack | 82.15.1 | | kube-prometheus-stack | 82.15.1 |
| Loki / Fluent Bit | 6.55.0 / 0.56.0 | | Loki / Fluent Bit | 6.55.0 / 0.56.0 |
| Sealed Secrets / ESO / Vault | 2.18.4 / 2.2.0 / 0.32.0 | | SOPS (client tooling) | see `clusters/noble/secrets/README.md` |
| Kyverno | 3.7.1 / policies 3.7.1 | | Kyverno | 3.7.1 / policies 3.7.1 |
| Newt | 1.2.0 / app 1.10.1 | | Newt | 1.2.0 / app 1.10.1 |
@@ -218,7 +214,7 @@ See [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) for the authoritative
## Narrative ## Narrative
The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210``192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, Loki, and **Vault**. **Secrets** combine **Sealed Secrets** for git-encrypted material, **Vault** with **External Secrets** for dynamic sync, and **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS. The **noble** environment is a **Talos** lab cluster on **`192.168.50.0/24`** with **three control plane nodes and one worker**, schedulable workloads on control planes enabled, and the Kubernetes API exposed through **kube-vip** at **`192.168.50.230`**. **Cilium** provides the CNI after Talos bootstrap with **`cni: none`**; **MetalLB** advertises **`192.168.50.210``192.168.50.229`**, pinning **Argo CD** to **`192.168.50.210`** and **Traefik** to **`192.168.50.211`** for **`*.apps.noble.lab.pcenicni.dev`**. **cert-manager** issues certificates for Traefik Ingresses; **GitOps** is **Ansible-driven Helm** for the platform (**`clusters/noble/bootstrap/`**) plus optional **Argo CD** app-of-apps (**`clusters/noble/apps/`**, **`clusters/noble/bootstrap/argocd/`**). **Observability** uses **kube-prometheus-stack** in **`monitoring`**, **Loki** and **Fluent Bit** with Grafana wired via a **ConfigMap** datasource, with **Longhorn** PVCs for Prometheus, Grafana, Alertmanager, and Loki. **Secrets** in git use **SOPS** + **age** under **`clusters/noble/secrets/`**; **Kyverno** enforces **Pod Security Standards baseline** in **Audit**. **Public** access uses **Newt** to **Pangolin** with **CNAME** and Integration API steps as documented—not generic in-cluster public DNS.
--- ---

100
docs/homelab-network.md Normal file
View File

@@ -0,0 +1,100 @@
# Homelab network inventory
Single place for **VLANs**, **static addressing**, and **hosts** beside the **noble** Talos cluster. **Proxmox** is the **hypervisor** for the VMs below; **all of those VMs are intended to run on `192.168.1.0/24`** (same broadcast domain as Pi-hole and typical home clients). **Noble** (Talos) stays on **`192.168.50.0/24`** per [`architecture.md`](architecture.md) and [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) until you change that design.
## VLANs (logical)
| Network | Role |
|---------|------|
| **`192.168.1.0/24`** | **Homelab / Proxmox LAN****Proxmox host(s)**, **all Proxmox VMs**, **Pi-hole**, **Mac mini**, and other servers that share this VLAN. |
| **`192.168.50.0/24`** | **Noble Talos** cluster — physical nodes, **kube-vip**, **MetalLB**, Traefik; **not** the Proxmox VM subnet. |
| **`192.168.60.0/24`** | **DMZ / WAN-facing****NPM**, **WebDAV**, **other services** that need WAN access. |
| **`192.168.40.0/24`** | **Home Assistant** and IoT devices — isolated; record subnet and HA IP in DHCP/router. |
**Routing / DNS:** Clients and VMs on **`192.168.1.0/24`** reach **noble** services on **`192.168.50.0/24`** via **L3** (router/firewall). **NFS** from OMV (`192.168.1.105`) to **noble** pods uses the **OMV data IP** as the NFS server address from the clusters perspective.
Firewall rules between VLANs are **out of scope** here; document them where you keep runbooks.
---
## `192.168.50.0/24` — reservations (noble only)
Do not assign **unrelated** static services on **this** VLAN without checking overlap with MetalLB and kube-vip.
| Use | Addresses |
|-----|-----------|
| Talos nodes | `.10``.40` (see [`talos/talconfig.yaml`](../talos/talconfig.yaml)) |
| MetalLB L2 pool | `.210``.229` |
| Traefik (ingress) | `.211` (typical) |
| Argo CD | `.210` (typical) |
| Kubernetes API (kube-vip) | **`.230`** — **must not** be a VM |
---
## Proxmox VMs (`192.168.1.0/24`)
All run on **Proxmox**; addresses below use **`192.168.1.0/24`** (same host octet as your earlier `.50.x` / `.60.x` plan, moved into the homelab VLAN). Adjust if your router uses a different numbering scheme.
Most are **Docker hosts** with multiple apps; treat the **IP** as the **host**, not individual containers.
| VM ID | Name | IP | Notes |
|-------|------|-----|--------|
| 666 | nginxproxymanager | `192.168.1.20` | NPM (edge / WAN-facing role — firewall as you design). |
| 777 | nginxproxymanager-Lan | `192.168.1.60` | NPM on **internal** homelab LAN. |
| 100 | Openmediavault | `192.168.1.105` | **NFS** exports for *arr / media paths. |
| 110 | Monitor | `192.168.1.110` | Uptime Kuma, Peekaping, Tracearr → cluster candidates. |
| 120 | arr | `192.168.1.120` | *arr stack; media via **NFS** from OMV — see [migration](#arr-stack-nfs-and-kubernetes). |
| 130 | Automate | `192.168.1.130` | Low use — **candidate to remove** or consolidate. |
| 140 | general-purpose | `192.168.1.140` | IT tools, Mealie, Open WebUI, SparkyFitness, … |
| 150 | Media-server | `192.168.1.150` | Jellyfin (test, **NFS** media), ebook server. |
| 160 | s3 | `192.168.1.170` | Object storage; **merge** into **central S3** on noble per [`shared-data-services.md`](shared-data-services.md) when ready. |
| 190 | Auth | `192.168.1.190` | **Authentik****noble (K8s)** for HA. |
| 300 | gitea | `192.168.1.203` | On **`.1`**, no overlap with noble **MetalLB `.210``.229`** on **`.50`**. |
| 310 | gitea-nsfw | `192.168.1.204` | |
| 500 | AMP | `192.168.1.47` | |
### Workload detail (what runs where)
**Auth (190)****Authentik** is the main service; moving it to **Kubernetes (noble)** gives you **HA**, rolling upgrades, and backups via your cluster patterns (PVCs, Velero, etc.). Plan **OIDC redirect URLs** and **outposts** (if used) when the **ingress hostname** and paths to **`.50`** services change.
**Monitor (110)****Uptime Kuma**, **Peekaping**, and **Tracearr** are a good fit for the cluster: small state (SQLite or small DBs), **Ingress** via Traefik, and **Longhorn** or a small DB PVC. Migrate **one app at a time** and keep the old VM until DNS and alerts are verified.
**arr (120)****Lidarr, Sonarr, Radarr**, and related *arr* apps; libraries and download paths point at **NFS** from **Openmediavault (100)** at **`192.168.1.105`**. The hard part is **keeping paths, permissions (UID/GID), and download client** wiring while pods move.
**Automate (130)** — Tools are **barely used**; **decommission**, merge into **general-purpose (140)**, or replace with a **CronJob** / one-shot on the cluster only if something still needs scheduling.
**general-purpose (140)** — “Daily driver” stack: **IT tools**, **Mealie**, **Open WebUI**, **SparkyFitness**, and similar. **Candidates for gradual moves** to noble; group by **data sensitivity** and **persistence** (Postgres vs SQLite) when you pick order.
**Media-server (150)****Jellyfin** (testing) with libraries on **NFS**; **ebook** server. Treat **Jellyfin** like *arr* for storage: same NFS export and **transcoding** needs (CPU on worker nodes or GPU if you add it). Ebook stack depends on what you run (e.g. Kavita, Audiobookshelf) — note **metadata paths** before moving.
### Arr stack, NFS, and Kubernetes
You do **not** have to move NFS into the cluster: **Openmediavault** on **`192.168.1.105`** can stay the **NFS server** while the *arr* apps run as **Deployments** with **ReadWriteMany** volumes. Noble nodes on **`192.168.50.0/24`** mount NFS using **that IP** (ensure **firewall** allows **NFS** from node IPs to OMV).
1. **Keep OMV as the single source of exports** — same **export path** (e.g. `/export/media`) from the clusters perspective as from the current VM.
2. **Mount NFS in Kubernetes** — use a **CSI NFS driver** (e.g. **nfs-subdir-external-provisioner** or **csi-driver-nfs**) so each app gets a **PVC** backed by a **subdirectory** of the export, **or** one shared RWX PVC for a common tree if your layout needs it.
3. **Match POSIX ownership** — set **supplemental groups** or **fsGroup** / **runAsUser** on the pods so Sonarr/Radarr see the same **UID/GID** as todays Docker setup; fix **squash** settings on OMV if you use `root_squash`.
4. **Config and DB** — back up each apps **config volume** (or SQLite files), redeploy with the same **environment**; point **download clients** and **NFS media roots** to the **same logical paths** inside the container.
5. **Low-risk path** — run **one** *arr* app on the cluster while the rest stay on **VM 120** until imports and downloads behave; then cut DNS/NPM streams over.
If you prefer **no** NFS from pods, the alternative is **large ReadWriteOnce** disks on Longhorn and **sync** from OMV — usually **more** moving parts than **RWX NFS** for this workload class.
---
## Other hosts
| Host | IP | VLAN / network | Notes |
|------|-----|----------------|--------|
| **Pi-hole** | `192.168.1.127` | `192.168.1.0/24` | DNS; same VLAN as Proxmox VMs. |
| **Home Assistant** | *TBD* | **IoT VLAN** | Add reservation when fixed. |
| **Mac mini** | `192.168.1.155` | `192.168.1.0/24` | Align with **Storage B** in [`Racks.md`](Racks.md) if the same machine. |
---
## Related docs
- **Shared Postgres + S3 (centralized):** [`shared-data-services.md`](shared-data-services.md)
- **VM → noble migration plan:** [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
- Noble cluster topology and ingress: [`architecture.md`](architecture.md)
- Physical racks (Primary / Storage B / Rack C): [`Racks.md`](Racks.md)
- Cluster checklist: [`../talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)

View File

@@ -0,0 +1,121 @@
# Migration plan: Proxmox VMs → noble (Kubernetes)
This document is the **default playbook** for moving workloads from **Proxmox VMs** on **`192.168.1.0/24`** into the **noble** Talos cluster on **`192.168.50.0/24`**. Source inventory and per-VM notes: [`homelab-network.md`](homelab-network.md). Cluster facts: [`architecture.md`](architecture.md), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md).
---
## 1. Scope and principles
| Principle | Detail |
|-----------|--------|
| **One service at a time** | Run the new workload on **noble** while the **VM** stays up; cut over **DNS / NPM** only after checks pass. |
| **Same container image** | Prefer the **same** upstream image and major version as Docker on the VM to reduce surprises. |
| **Data moves with a plan** | **Backup** VM volumes or export DB dumps **before** the first deploy to the cluster. |
| **Ingress on noble** | Internal apps use **Traefik** + **`*.apps.noble.lab.pcenicni.dev`** (or your chosen hostnames) and **MetalLB** (e.g. **`192.168.50.211`**) per [`architecture.md`](architecture.md). |
| **Cross-VLAN** | Clients on **`.1`** reach services on **`.50`** via **routing**; **firewall** must allow **NFS** from **Talos node IPs** to **OMV `192.168.1.105`** when pods mount NFS. |
**Not everything must move.** Keep **Openmediavault** (and optionally **NPM**) on VMs if you prefer; the cluster consumes **NFS** and **HTTP** from them.
---
## 2. Prerequisites (before wave 1)
1. **Cluster healthy**`kubectl get nodes`; [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) checklist through ingress and cert-manager as needed.
2. **Ingress + TLS****Traefik** + **cert-manager** working; you can hit a **test Ingress** on the MetalLB IP.
3. **GitOps / deploy path** — Decide per app: **Helm** under `clusters/noble/apps/`, **Argo CD**, or **Ansible**-applied manifests (match how you manage the rest of noble).
4. **Secrets** — Plan **Kubernetes Secrets**; for git-stored material, align with **SOPS** (`clusters/noble/secrets/`, `.sops.yaml`).
5. **Storage****Longhorn** default for **ReadWriteOnce** state; for **NFS** (*arr*, Jellyfin), install a **CSI NFS** driver and test a **small RWX PVC** before migrating data-heavy apps.
6. **Shared data tier (recommended)** — Deploy **centralized PostgreSQL** and **S3-compatible storage** on noble so apps do not each ship their own DB/object store; see [`shared-data-services.md`](shared-data-services.md).
7. **Firewall** — Rules: **workstation → `192.168.50.230:6443`**; **nodes → OMV NFS ports**; **clients → `192.168.50.211`** (or split-horizon DNS) as you design.
8. **DNS** — Split-horizon or Pi-hole records for **`*.apps.noble.lab.pcenicni.dev`** → **Traefik** IP **`192.168.50.211`** for LAN clients.
---
## 3. Standard migration procedure (repeat per app)
Use this checklist for **each** application (or small group, e.g. one Helm release).
| Step | Action |
|------|--------|
| **A. Discover** | Document **image:tag**, **ports**, **volumes** (host paths), **env vars**, **depends_on** (DB, Redis, NFS path). Export **docker inspect** / **compose** from the VM. |
| **B. Backup** | Snapshot **Proxmox VM** or backup **volume** / **SQLite** / **DB dump** to offline storage. |
| **C. Namespace** | Create a **dedicated namespace** (e.g. `monitoring-tools`, `authentik`) or use your house standard. |
| **D. Deploy** | Add **Deployment** (or **StatefulSet**), **Service**, **Ingress** (class **traefik**), **PVCs**; wire **secrets** from **Secrets** (not literals in git). |
| **E. Storage** | **Longhorn** PVC for local state; **NFS CSI** PVC for shared media/config paths that must match the VM (see [`homelab-network.md`](homelab-network.md) *arr* section). Prefer **shared Postgres** / **shared S3** per [`shared-data-services.md`](shared-data-services.md) instead of new embedded databases. Match **UID/GID** with `securityContext`. |
| **F. Smoke test** | `kubectl port-forward` or temporary **Ingress** hostname; log in, run one critical workflow (login, playback, sync). |
| **G. DNS cutover** | Point **internal DNS** or **NPM** upstream from the **VM IP** to the **new hostname** (Traefik) or **MetalLB IP** + Host header. |
| **H. Observe** | 2472 hours: logs, alerts, **Uptime Kuma** (once migrated), backups. |
| **I. Decommission** | Stop the **container** on the VM (not the whole VM until the **whole** VM is empty). |
| **J. VM off** | When **no** services remain on that VM, **power off** and archive or delete the VM. |
**Rollback:** Re-enable the VM service, revert **DNS/NPM** to the old IP, delete or scale the cluster deployment to zero.
---
## 4. Recommended migration order (phases)
Order balances **risk**, **dependencies**, and **learning curve**.
| Phase | Target | Rationale |
|-------|--------|-----------|
| **0 — Optional** | **Automate (130)** | Low use: **retire** or replace with **CronJobs**; skip if nothing valuable runs. |
| **0b — Platform** | **Shared Postgres + S3** on noble | Run **before** or alongside early waves so new deploys use **one DSN** and **one object endpoint**; retire **VM 160** when empty. See [`shared-data-services.md`](shared-data-services.md). |
| **1 — Observability** | **Monitor (110)** — Uptime Kuma, Peekaping, Tracearr | Small state, validates **Ingress**, **PVCs**, and **alert paths** before auth and media. |
| **2 — Git** | **gitea (300)**, **gitea-nsfw (310)** | Point at **shared Postgres** + **S3** for attachments; move **repos** with **PVC** + backup restore if needed. |
| **3 — Object / misc** | **s3 (160)**, **AMP (500)** | **Migrate data** into **central** S3 on cluster, then **decommission** duplicate MinIO on VM **160** if applicable. |
| **4 — Auth** | **Auth (190)****Authentik** | Use **shared Postgres**; update **all OIDC clients** (Gitea, apps, NPM) with **new issuer URLs**; schedule a **maintenance window**. |
| **5 — Daily apps** | **general-purpose (140)** | Move **one app per release** (Mealie, Open WebUI, …); each app gets its **own database** (and bucket if needed) on the **shared** tiers — not a new Postgres pod per app. |
| **6 — Media / *arr*** | **arr (120)**, **Media-server (150)** | **NFS** from **OMV**, download clients, **transcoding** — migrate **one *arr*** then Jellyfin/ebook; see NFS bullets in [`homelab-network.md`](homelab-network.md). |
| **7 — Edge** | **NPM (666/777)** | Often **last**: either keep on Proxmox or replace with **Traefik** + **IngressRoutes** / **Gateway API**; many people keep a **dedicated** reverse proxy VM until parity is proven. |
**Openmediavault (100)** — Typically **stays** as **NFS** (and maybe backup target) for the cluster; no need to “migrate” the whole NAS into Kubernetes.
---
## 5. Ingress and reverse proxy
| Approach | When to use |
|----------|-------------|
| **Traefik Ingress** on noble | Default for **internal** HTTPS apps; **cert-manager** for public names you control. |
| **NPM (VM)** as front door | Point **proxy host****Traefik MetalLB IP** or **service name** if you add internal DNS; reduces double-proxy if you **terminate TLS** in one place only. |
| **Newt / Pangolin** | Public reachability per [`clusters/noble/bootstrap/newt/README.md`](../clusters/noble/bootstrap/newt/README.md); not automatic ExternalDNS. |
Avoid **two** TLS terminations for the same hostname unless you intend **SSL passthrough** end-to-end.
---
## 6. Authentik-specific (Auth VM → cluster)
1. **Backup** Authentik **PostgreSQL** (or embedded DB) and **media** volume from the VM.
2. Deploy **Helm** (official chart) with **same** Authentik version if possible.
3. **Restore** DB into **shared cluster Postgres** (recommended) or chart-managed DB — see [`shared-data-services.md`](shared-data-services.md).
4. Update **issuer URL** in every **OIDC/OAuth** client (Gitea, Grafana, etc.).
5. Re-test **outposts** (if any) and **redirect URIs** from both **`.1`** and **`.50`** client perspectives.
6. **Cut over DNS**; then **decommission** VM **190**.
---
## 7. *arr* and Jellyfin-specific
Follow the **numbered list** under **“Arr stack, NFS, and Kubernetes”** in [`homelab-network.md`](homelab-network.md). In short: **OMV stays**; **CSI NFS** + **RWX**; **match permissions**; migrate **one app** first; verify **download client** can reach the new pod **IP/DNS** from your download host.
---
## 8. Validation checklist (per wave)
- Pods **Ready**, **Ingress** returns **200** / login page.
- **TLS** valid for chosen hostname.
- **Persistent data** present (new uploads, DB writes survive pod restart).
- **Backups** (Velero or app-level) defined for the new location.
- **Monitoring** / alerts updated (targets, not old VM IP).
- **Documentation** in [`homelab-network.md`](homelab-network.md) updated (VM retired or marked migrated).
---
## Related docs
- **Shared Postgres + S3:** [`shared-data-services.md`](shared-data-services.md)
- VM inventory and NFS notes: [`homelab-network.md`](homelab-network.md)
- Noble topology, MetalLB, Traefik: [`architecture.md`](architecture.md)
- Bootstrap and versions: [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
- Apps layout: [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md)

View File

@@ -0,0 +1,90 @@
# Centralized PostgreSQL and S3-compatible storage
Goal: **one shared PostgreSQL** and **one S3-compatible object store** on **noble**, instead of every app bundling its own database or MinIO. Apps keep **logical isolation** via **per-app databases** / **users** and **per-app buckets** (or prefixes), not separate clusters.
See also: [`migration-vm-to-noble.md`](migration-vm-to-noble.md), [`homelab-network.md`](homelab-network.md) (VM **160** `s3` today), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) (Velero + S3).
---
## 1. Why centralize
| Benefit | Detail |
|--------|--------|
| **Operations** | One backup/restore story, one upgrade cadence, one place to tune **IOPS** and **retention**. |
| **Security** | **Least privilege**: each app gets its own **DB user** and **S3 credentials** scoped to one database or bucket. |
| **Resources** | Fewer duplicate **Postgres** or **MinIO** sidecars; better use of **Longhorn** or dedicated PVCs for the shared tiers. |
**Tradeoff:** Shared tiers are **blast-radius** targets — use **backups**, **PITR** where you care, and **NetworkPolicies** so only expected namespaces talk to Postgres/S3.
---
## 2. PostgreSQL — recommended pattern
1. **Run Postgres on noble** — Operators such as **CloudNativePG**, **Zalando Postgres operator**, or a well-maintained **Helm** chart with **replicas** + **persistent volumes** (Longhorn).
2. **One cluster instance, many databases** — For each app: `CREATE DATABASE appname;` and a **dedicated role** with `CONNECT` on that database only (not superuser).
3. **Connection from apps** — Use a **Kubernetes Service** (e.g. `postgres-platform.platform.svc.cluster.local:5432`) and pass **credentials** via **Secrets** (ideally **SOPS**-encrypted in git).
4. **Migrations** — Run app **migration** jobs or init containers against the **same** DSN after DB exists.
**Migrating off SQLite / embedded Postgres**
- **SQLite → Postgres:** export/import per app (native tools, or **pgloader** where appropriate).
- **Docker Postgres volume:** `pg_dumpall` or per-DB `pg_dump` → restore into a **new** database on the shared server; **freeze writes** during cutover.
---
## 3. S3-compatible object storage — recommended pattern
1. **Run one S3 API on noble****MinIO** (common), **Garage**, or **SeaweedFS** S3 layer — with **PVC(s)** or host path for data; **erasure coding** / replicas if the chart supports it and you want durability across nodes.
2. **Buckets per concern** — e.g. `gitea-attachments`, `velero`, `loki-archive` — not one global bucket unless you enforce **prefix** IAM policies.
3. **Credentials****IAM-style** users limited to **one bucket** (or prefix); **Secrets** reference **access key** / **secret**; never commit keys in plain text.
4. **Endpoint for pods** — In-cluster: `http://minio.platform.svc.cluster.local:9000` (or TLS inside mesh). Apps use **virtual-hosted** or **path-style** per SDK defaults.
### NFS as backing store for S3 on noble
**Yes.** You can run MinIO (or another S3-compatible server) with its **data directory** on a **ReadWriteMany** volume that is **NFS** — for example the same **Openmediavault** export you already use, mounted via your **NFS CSI** driver (see [`homelab-network.md`](homelab-network.md)).
| Consideration | Detail |
|---------------|--------|
| **Works for homelab** | MinIO stores objects as files under a path; **POSIX** on NFS is enough for many setups. |
| **Performance** | NFS adds **latency** and shared bandwidth; fine for moderate use, less ideal for heavy multi-tenant throughput. |
| **Availability** | The **NFS server** (OMV) becomes part of the availability story for object data — plan **backups** and **OMV** health like any dependency. |
| **Locking / semantics** | Prefer **NFSv4.x**; avoid mixing **NFS** and expectations of **local SSD** (e.g. very chatty small writes). If you see odd behavior, **Longhorn** (block) on a node is the usual next step. |
| **Layering** | You are stacking **S3 API → file layout → NFS → disk**; that is normal for a lab, just **monitor** space and exports on OMV. |
**Summary:** NFS-backed PVC for MinIO is **valid** on noble; use **Longhorn** (or local disk) when you need **better IOPS** or want object data **inside** the clusters storage domain without depending on OMV for that tier.
**Migrating off VM 160 (`s3`) or per-app MinIO**
- **MinIO → MinIO:** `mc mirror` between aliases, or **replication** if you configure it.
- **Same API:** Any tool speaking **S3** can **sync** buckets before you point apps at the new endpoint.
**Velero** — Point the **backup location** at the **central** bucket (see cluster Velero docs); avoid a second ad-hoc object store for backups if one cluster bucket is enough.
---
## 4. Ordering relative to app migrations
| When | What |
|------|------|
| **Early** | Stand up **Postgres** + **S3** with **empty** DBs/buckets; test with **one** non-critical app (e.g. a throwaway deployment). |
| **Before auth / Git** | **Gitea** and **Authentik** benefit from **managed Postgres** early — plan **DSN** and **bucket** for attachments **before** cutover. |
| **Ongoing** | New apps **must not** ship embedded **Postgres/MinIO** unless the workload truly requires it (e.g. vendor appliance). |
---
## 5. Checklist (platform team)
- [ ] Postgres **Service** DNS name and **TLS** (optional in-cluster) documented.
- [ ] S3 **endpoint**, **region** string (can be `us-east-1` for MinIO), **TLS** for Ingress if clients are outside the cluster.
- [ ] **Backup:** scheduled **logical dumps** (Postgres) and **bucket replication** or **object versioning** where needed.
- [ ] **SOPS** / **External Secrets** pattern for **rotation** without editing app manifests by hand.
- [ ] **homelab-network.md** updated when **VM 160** is retired or repurposed.
---
## Related docs
- VM → cluster migration: [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
- Inventory (s3 VM): [`homelab-network.md`](homelab-network.md)
- Longhorn / storage runbook: [`../talos/runbooks/longhorn.md`](../talos/runbooks/longhorn.md)
- Velero (S3 backup target): [`../clusters/noble/bootstrap/velero/`](../clusters/noble/bootstrap/velero/) (if present)

View File

@@ -7,7 +7,7 @@
services: services:
tracearr: tracearr:
image: ghcr.io/connorgallopo/tracearr:supervised-nightly image: ghcr.io/connorgallopo/tracearr:latest
shm_size: 256mb # Required for PostgreSQL shared memory shm_size: 256mb # Required for PostgreSQL shared memory
ports: ports:
- "${PORT:-3000}:3000" - "${PORT:-3000}:3000"

View File

@@ -4,7 +4,7 @@ This document is the **exported TODO** for the **noble** Talos cluster (4 nodes)
## Current state (2026-03-28) ## Current state (2026-03-28)
Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vault **CiliumNetworkPolicy**, **`talos/runbooks/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** backup/restore drill after S3 credentials are set (**`noble_velero_install`**). Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (**`talos/runbooks/`**, **SOPS**-encrypted secrets in **`clusters/noble/secrets/`**). **Next focus:** optional **Alertmanager** receivers (Slack/PagerDuty); tighten **RBAC** (Headlamp / cluster-admin); **Cilium** policies for other namespaces as needed; enable **Mend Renovate** for PRs; Pangolin/sample Ingress; **Velero** backup/restore drill after S3 credentials are set (**`noble_velero_install`**).
- **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6` - **Talos** v1.12.6 (target) / **Kubernetes** as bundled — four nodes **Ready** unless upgrading; **`talosctl health`**; **`talos/kubeconfig`** is **local only** (gitignored — never commit; regenerate with `talosctl kubeconfig` per `talos/README.md`). **Image Factory (nocloud installer):** `factory.talos.dev/nocloud-installer/249d9135de54962744e917cfe654117000cba369f9152fbab9d055a00aa3664f:v1.12.6`
- **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/bootstrap/cilium/`, phase 1 values). - **Cilium** Helm **1.16.6** / app **1.16.6** (`clusters/noble/bootstrap/cilium/`, phase 1 values).
@@ -15,13 +15,11 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
- **Longhorn** Helm **1.11.1** / app **v1.11.1**`clusters/noble/bootstrap/longhorn/` (PSA **privileged** namespace, `defaultDataPath` `/var/mnt/longhorn`, `preUpgradeChecker` enabled); **StorageClass** `longhorn` (default); **`nodes.longhorn.io`** all **Ready**; test **PVC** `Bound` on `longhorn`. - **Longhorn** Helm **1.11.1** / app **v1.11.1**`clusters/noble/bootstrap/longhorn/` (PSA **privileged** namespace, `defaultDataPath` `/var/mnt/longhorn`, `preUpgradeChecker` enabled); **StorageClass** `longhorn` (default); **`nodes.longhorn.io`** all **Ready**; test **PVC** `Bound` on `longhorn`.
- **Traefik** Helm **39.0.6** / app **v3.6.11**`clusters/noble/bootstrap/traefik/`; **`Service`** **`LoadBalancer`** **`EXTERNAL-IP` `192.168.50.211`**; **`IngressClass`** **`traefik`** (default). Point **`*.apps.noble.lab.pcenicni.dev`** at **`192.168.50.211`**. MetalLB pool verification was done before replacing the temporary nginx test with Traefik. - **Traefik** Helm **39.0.6** / app **v3.6.11**`clusters/noble/bootstrap/traefik/`; **`Service`** **`LoadBalancer`** **`EXTERNAL-IP` `192.168.50.211`**; **`IngressClass`** **`traefik`** (default). Point **`*.apps.noble.lab.pcenicni.dev`** at **`192.168.50.211`**. MetalLB pool verification was done before replacing the temporary nginx test with Traefik.
- **cert-manager** Helm **v1.20.0** / app **v1.20.0**`clusters/noble/bootstrap/cert-manager/`; **`ClusterIssuer`** **`letsencrypt-staging`** and **`letsencrypt-prod`** (**DNS-01** via **Cloudflare** for **`pcenicni.dev`**, Secret **`cloudflare-dns-api-token`** in **`cert-manager`**); ACME email **`certificates@noble.lab.pcenicni.dev`** (edit in manifests if you want a different mailbox). - **cert-manager** Helm **v1.20.0** / app **v1.20.0**`clusters/noble/bootstrap/cert-manager/`; **`ClusterIssuer`** **`letsencrypt-staging`** and **`letsencrypt-prod`** (**DNS-01** via **Cloudflare** for **`pcenicni.dev`**, Secret **`cloudflare-dns-api-token`** in **`cert-manager`**); ACME email **`certificates@noble.lab.pcenicni.dev`** (edit in manifests if you want a different mailbox).
- **Newt** Helm **1.2.0** / app **1.10.1**`clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Prefer a **SealedSecret** in git (`kubeseal` — see `clusters/noble/bootstrap/sealed-secrets/examples/`) after rotating credentials if they were exposed. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolins domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver). - **Newt** Helm **1.2.0** / app **1.10.1**`clusters/noble/bootstrap/newt/` (**fossorial/newt**); Pangolin site tunnel — **`newt-pangolin-auth`** Secret (**`PANGOLIN_ENDPOINT`**, **`NEWT_ID`**, **`NEWT_SECRET`**). Store credentials in git with **SOPS** (`clusters/noble/secrets/newt-pangolin-auth.secret.yaml`, **`age-key.txt`**, **`.sops.yaml`**) — see **`clusters/noble/secrets/README.md`**. **Public DNS** is **not** automated with ExternalDNS: **CNAME** records at your DNS host per Pangolins domain instructions, plus **Integration API** for HTTP resources/targets — see **`clusters/noble/bootstrap/newt/README.md`**. LAN access to Traefik can still use **`*.apps.noble.lab.pcenicni.dev`** → **`192.168.50.211`** (split horizon / local resolver).
- **Argo CD** Helm **9.4.17** / app **v3.3.6**`clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; app-of-apps root syncs **`clusters/noble/apps/`** (edit **`root-application.yaml`** `repoURL` before applying). - **Argo CD** Helm **9.4.17** / app **v3.3.6**`clusters/noble/bootstrap/argocd/`; **`argocd-server`** **`LoadBalancer`** **`192.168.50.210`**; app-of-apps root syncs **`clusters/noble/apps/`** (edit **`root-application.yaml`** `repoURL` before applying).
- **kube-prometheus-stack** — Helm chart **82.15.1**`clusters/noble/bootstrap/kube-prometheus-stack/` (**namespace** `monitoring`, PSA **privileged****node-exporter** needs host mounts); **Longhorn** PVCs for Prometheus, Grafana, Alertmanager; **node-exporter** DaemonSet **4/4**. **Grafana Ingress:** **`https://grafana.apps.noble.lab.pcenicni.dev`** (Traefik **`ingressClassName: traefik`**, **`cert-manager.io/cluster-issuer: letsencrypt-prod`**). **Loki** datasource in Grafana: ConfigMap **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** (sidecar label **`grafana_datasource: "1"`**) — not via **`grafana.additionalDataSources`** in the chart. **`helm upgrade --install` with `--wait` is silent until done** — use **`--timeout 30m`**; Grafana admin: Secret **`kube-prometheus-grafana`**, keys **`admin-user`** / **`admin-password`**. - **kube-prometheus-stack** — Helm chart **82.15.1**`clusters/noble/bootstrap/kube-prometheus-stack/` (**namespace** `monitoring`, PSA **privileged****node-exporter** needs host mounts); **Longhorn** PVCs for Prometheus, Grafana, Alertmanager; **node-exporter** DaemonSet **4/4**. **Grafana Ingress:** **`https://grafana.apps.noble.lab.pcenicni.dev`** (Traefik **`ingressClassName: traefik`**, **`cert-manager.io/cluster-issuer: letsencrypt-prod`**). **Loki** datasource in Grafana: ConfigMap **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** (sidecar label **`grafana_datasource: "1"`**) — not via **`grafana.additionalDataSources`** in the chart. **`helm upgrade --install` with `--wait` is silent until done** — use **`--timeout 30m`**; Grafana admin: Secret **`kube-prometheus-grafana`**, keys **`admin-user`** / **`admin-password`**.
- **Loki** + **Fluent Bit****`grafana/loki` 6.55.0** SingleBinary + **filesystem** on **Longhorn** (`clusters/noble/bootstrap/loki/`); **`loki.auth_enabled: false`**; **`chunksCache.enabled: false`** (no memcached chunk cache). **`fluent/fluent-bit` 0.56.0** → **`loki-gateway.loki.svc:80`** (`clusters/noble/bootstrap/fluent-bit/`); **`logging`** PSA **privileged**. **Grafana Explore:** **`kubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** then **Explore → Loki** (e.g. `{job="fluent-bit"}`). - **Loki** + **Fluent Bit****`grafana/loki` 6.55.0** SingleBinary + **filesystem** on **Longhorn** (`clusters/noble/bootstrap/loki/`); **`loki.auth_enabled: false`**; **`chunksCache.enabled: false`** (no memcached chunk cache). **`fluent/fluent-bit` 0.56.0** → **`loki-gateway.loki.svc:80`** (`clusters/noble/bootstrap/fluent-bit/`); **`logging`** PSA **privileged**. **Grafana Explore:** **`kubectl apply -f clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** then **Explore → Loki** (e.g. `{job="fluent-bit"}`).
- **Sealed Secrets** Helm **2.18.4** / app **0.36.1**`clusters/noble/bootstrap/sealed-secrets/` (namespace **`sealed-secrets`**); **`kubeseal`** on client should match controller minor (**README**); back up **`sealed-secrets-key`** (see README). - **SOPS** — cluster **`Secret`** manifests under **`clusters/noble/secrets/`** encrypted with **age** (see **`.sops.yaml`**, **`age-key.txt`** gitignored); **`noble.yml`** decrypt-applies when the private key is present.
- **External Secrets Operator** Helm **2.2.0** / app **v2.2.0**`clusters/noble/bootstrap/external-secrets/`; Vault **`ClusterSecretStore`** in **`examples/vault-cluster-secret-store.yaml`** (**`http://`** to match Vault listener — apply after Vault **Kubernetes auth**).
- **Vault** Helm **0.32.0** / app **1.21.2**`clusters/noble/bootstrap/vault/` — standalone **file** storage, **Longhorn** PVC; **HTTP** listener (`global.tlsDisable`); optional **CronJob** lab unseal **`unseal-cronjob.yaml`**; **not** initialized in git — run **`vault operator init`** per **`README.md`**.
- **Velero** Helm **12.0.0** / app **v1.18.0**`clusters/noble/bootstrap/velero/` (**Ansible** **`noble_velero`**, not Argo); **S3-compatible** backup location + **CSI** snapshots (**`EnableCSI`**); enable with **`noble_velero_install`** per **`velero/README.md`**. - **Velero** Helm **12.0.0** / app **v1.18.0**`clusters/noble/bootstrap/velero/` (**Ansible** **`noble_velero`**, not Argo); **S3-compatible** backup location + **CSI** snapshots (**`EnableCSI`**); enable with **`noble_velero_install`** per **`velero/README.md`**.
- **Still open:** **Renovate** — install **[Mend Renovate](https://github.com/apps/renovate)** (or self-host) so PRs run; optional **Alertmanager** notification channels; optional **sample Ingress + cert + Pangolin** end-to-end; **Argo CD SSO**. - **Still open:** **Renovate** — install **[Mend Renovate](https://github.com/apps/renovate)** (or self-host) so PRs run; optional **Alertmanager** notification channels; optional **sample Ingress + cert + Pangolin** end-to-end; **Argo CD SSO**.
@@ -64,9 +62,6 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
- kube-prometheus-stack: **82.15.1** (Helm chart `prometheus-community/kube-prometheus-stack`; app **v0.89.x** bundle) - kube-prometheus-stack: **82.15.1** (Helm chart `prometheus-community/kube-prometheus-stack`; app **v0.89.x** bundle)
- Loki: **6.55.0** (Helm chart `grafana/loki`; app **3.6.7**) - Loki: **6.55.0** (Helm chart `grafana/loki`; app **3.6.7**)
- Fluent Bit: **0.56.0** (Helm chart `fluent/fluent-bit`; app **4.2.3**) - Fluent Bit: **0.56.0** (Helm chart `fluent/fluent-bit`; app **4.2.3**)
- Sealed Secrets: **2.18.4** (Helm chart `sealed-secrets/sealed-secrets`; app **0.36.1**)
- External Secrets Operator: **2.2.0** (Helm chart `external-secrets/external-secrets`; app **v2.2.0**)
- Vault: **0.32.0** (Helm chart `hashicorp/vault`; app **1.21.2**)
- Kyverno: **3.7.1** (Helm chart `kyverno/kyverno`; app **v1.17.1**); **kyverno-policies** **3.7.1****baseline** PSS, **Audit** (`clusters/noble/bootstrap/kyverno/`) - Kyverno: **3.7.1** (Helm chart `kyverno/kyverno`; app **v1.17.1**); **kyverno-policies** **3.7.1****baseline** PSS, **Audit** (`clusters/noble/bootstrap/kyverno/`)
- Headlamp: **0.40.1** (Helm chart `headlamp/headlamp`; app matches chart — see [Artifact Hub](https://artifacthub.io/packages/helm/headlamp/headlamp)) - Headlamp: **0.40.1** (Helm chart `headlamp/headlamp`; app matches chart — see [Artifact Hub](https://artifacthub.io/packages/helm/headlamp/headlamp))
- Velero: **12.0.0** (Helm chart `vmware-tanzu/velero`; app **v1.18.0**) — **`clusters/noble/bootstrap/velero/`**; AWS plugin **v1.14.0**; Ansible **`noble_velero`** - Velero: **12.0.0** (Helm chart `vmware-tanzu/velero`; app **v1.18.0**) — **`clusters/noble/bootstrap/velero/`**; AWS plugin **v1.14.0**; Ansible **`noble_velero`**
@@ -77,7 +72,7 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
| Artifact | Path | | Artifact | Path |
|----------|------| |----------|------|
| This checklist | `talos/CLUSTER-BUILD.md` | | This checklist | `talos/CLUSTER-BUILD.md` |
| Operational runbooks (API VIP, etcd, Longhorn, Vault) | `talos/runbooks/` | | Operational runbooks (API VIP, etcd, Longhorn, SOPS) | `talos/runbooks/` |
| Talos quick start + networking + kubeconfig | `talos/README.md` | | Talos quick start + networking + kubeconfig | `talos/README.md` |
| talhelper source (active) | `talos/talconfig.yaml` — may be **wipe-phase** (no Longhorn volume) during disk recovery | | talhelper source (active) | `talos/talconfig.yaml` — may be **wipe-phase** (no Longhorn volume) during disk recovery |
| Longhorn volume restore | `talos/talconfig.with-longhorn.yaml` — copy to `talconfig.yaml` after GPT wipe (see `talos/README.md` §5) | | Longhorn volume restore | `talos/talconfig.with-longhorn.yaml` — copy to `talconfig.yaml` after GPT wipe (see `talos/README.md` §5) |
@@ -96,13 +91,11 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
| Grafana Loki datasource (ConfigMap; no chart change) | `clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml` | | Grafana Loki datasource (ConfigMap; no chart change) | `clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml` |
| Loki (Helm values) | `clusters/noble/bootstrap/loki/``values.yaml`, `namespace.yaml` | | Loki (Helm values) | `clusters/noble/bootstrap/loki/``values.yaml`, `namespace.yaml` |
| Fluent Bit → Loki (Helm values) | `clusters/noble/bootstrap/fluent-bit/``values.yaml`, `namespace.yaml` | | Fluent Bit → Loki (Helm values) | `clusters/noble/bootstrap/fluent-bit/``values.yaml`, `namespace.yaml` |
| Sealed Secrets (Helm) | `clusters/noble/bootstrap/sealed-secrets/``values.yaml`, `namespace.yaml`, `README.md` | | SOPS-encrypted cluster Secrets | `clusters/noble/secrets/``README.md`, `*.secret.yaml`; **`.sops.yaml`**, **`age-key.txt`** (gitignored) at repo root |
| External Secrets Operator (Helm + Vault store example) | `clusters/noble/bootstrap/external-secrets/``values.yaml`, `namespace.yaml`, `README.md`, `examples/vault-cluster-secret-store.yaml` |
| Vault (Helm + optional unseal CronJob) | `clusters/noble/bootstrap/vault/``values.yaml`, `namespace.yaml`, `unseal-cronjob.yaml`, `cilium-network-policy.yaml`, `configure-kubernetes-auth.sh`, `README.md` |
| Kyverno + PSS baseline policies | `clusters/noble/bootstrap/kyverno/``values.yaml`, `policies-values.yaml`, `namespace.yaml`, `README.md` | | Kyverno + PSS baseline policies | `clusters/noble/bootstrap/kyverno/``values.yaml`, `policies-values.yaml`, `namespace.yaml`, `README.md` |
| Headlamp (Helm + Ingress) | `clusters/noble/bootstrap/headlamp/``values.yaml`, `namespace.yaml`, `README.md` | | Headlamp (Helm + Ingress) | `clusters/noble/bootstrap/headlamp/``values.yaml`, `namespace.yaml`, `README.md` |
| Velero (Helm + S3 BSL; CSI snapshots) | `clusters/noble/bootstrap/velero/``values.yaml`, `namespace.yaml`, `README.md`; **`ansible/roles/noble_velero`** | | Velero (Helm + S3 BSL; CSI snapshots) | `clusters/noble/bootstrap/velero/``values.yaml`, `namespace.yaml`, `README.md`; **`ansible/roles/noble_velero`** |
| Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (**Sealed Secrets** / **ESO** after **Phase E**) | | Renovate (repo config + optional self-hosted Helm) | **`renovate.json`** at repo root; optional self-hosted chart under **`clusters/noble/apps/`** (Argo) + token Secret (SOPS under **`clusters/noble/secrets/`** or imperative **`kubectl create secret`**) |
**Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance. **Git vs cluster:** manifests and `talconfig` live in git; **`talhelper genconfig -o out`**, bootstrap, Helm, and `kubectl` run on your LAN. See **`talos/README.md`** for workstation reachability (lab LAN/VPN), **`talosctl kubeconfig`** vs Kubernetes `server:` (VIP vs node IP), and **`--insecure`** only in maintenance.
@@ -114,10 +107,9 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
4. **CSI Volume snapshots:** **`kubernetes-csi/external-snapshotter`** CRDs + **`snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`) before relying on **Longhorn** / **Velero** volume snapshots. 4. **CSI Volume snapshots:** **`kubernetes-csi/external-snapshotter`** CRDs + **`snapshot-controller`** (`clusters/noble/bootstrap/csi-snapshot-controller/`) before relying on **Longhorn** / **Velero** volume snapshots.
5. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`. 5. **Longhorn:** Talos user volume + extensions in `talconfig.with-longhorn.yaml` (when restored); Helm **`defaultDataPath`** in `clusters/noble/bootstrap/longhorn/values.yaml`.
6. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki). 6. **Loki → Fluent Bit → Grafana datasource:** deploy **Loki** (`loki-gateway` Service) before **Fluent Bit**; apply **`clusters/noble/bootstrap/grafana-loki-datasource/loki-datasource.yaml`** after **Loki** (sidecar picks up the ConfigMap — no kube-prometheus values change for Loki).
7. **Vault:** **Longhorn** default **StorageClass** before **`clusters/noble/bootstrap/vault/`** Helm (PVC **`data-vault-0`**); **External Secrets** **`ClusterSecretStore`** after Vault is initialized, unsealed, and **Kubernetes auth** is configured. 7. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**).
8. **Headlamp:** **Traefik** + **cert-manager** (**`letsencrypt-prod`**) before exposing **`headlamp.apps.noble.lab.pcenicni.dev`**; treat as **cluster-admin** UI — protect with network policy / SSO when hardening (**Phase G**). 8. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, store the token with **SOPS** or an imperative Secret — no ingress required for the bot itself.
9. **Renovate:** **Git remote** + platform access (**hosted app** needs org/repo install; **self-hosted** needs **`RENOVATE_TOKEN`** and chart **`renovate.config`**). If the bot runs **in-cluster**, add the token **after** **Sealed Secrets** / **Vault** (**Phase E**) — no ingress required for the bot itself. 9. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
10. **Velero:** **S3-compatible** endpoint + bucket + **`velero/velero-cloud-credentials`** before **`ansible/playbooks/noble.yml`** with **`noble_velero_install: true`**; for **CSI** volume snapshots, label a **VolumeSnapshotClass** per **`clusters/noble/bootstrap/velero/README.md`** (e.g. Longhorn).
## Prerequisites (before phases) ## Prerequisites (before phases)
@@ -160,7 +152,7 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
- [x] **Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`) — also covered by **`ansible/playbooks/noble.yml`** (role **`noble_argocd`**) - [x] **Argo CD** bootstrap — `clusters/noble/bootstrap/argocd/` (`helm upgrade --install argocd …`) — also covered by **`ansible/playbooks/noble.yml`** (role **`noble_argocd`**)
- [x] Argo CD server **LoadBalancer****`192.168.50.210`** (see `values.yaml`) - [x] Argo CD server **LoadBalancer****`192.168.50.210`** (see `values.yaml`)
- [x] **App-of-apps** — optional; **`clusters/noble/apps/kustomization.yaml`** is **empty** (core stack is **Ansible**-managed from **`clusters/noble/bootstrap/`**, not Argo). Set **`repoURL`** in **`root-application.yaml`** and add **`Application`** manifests only for optional GitOps workloads — see **`clusters/noble/apps/README.md`** - [x] **App-of-apps** — optional; **`clusters/noble/apps/kustomization.yaml`** is **empty** (core stack is **Ansible**-managed from **`clusters/noble/bootstrap/`**, not Argo). Set **`repoURL`** in **`root-application.yaml`** and add **`Application`** manifests only for optional GitOps workloads — see **`clusters/noble/apps/README.md`**
- [x] **Renovate****`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **Sealed Secrets** / **ESO**. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed. - [x] **Renovate****`renovate.json`** at repo root ([Renovate](https://docs.renovatebot.com/) — **Kubernetes** manager for **`clusters/noble/**/*.yaml`** image pins; grouped minor/patch PRs). **Activate PRs:** install **[Mend Renovate](https://github.com/apps/renovate)** on the Git repo (**Option A**), or **Option B:** self-hosted chart per [Helm charts](https://docs.renovatebot.com/helm-charts/) + token from **SOPS** or a one-off Secret. Helm **chart** versions pinned only in comments still need manual bumps or extra **regex** `customManagers` — extend **`renovate.json`** as needed.
- [ ] SSO — later - [ ] SSO — later
## Phase D — Observability ## Phase D — Observability
@@ -171,9 +163,7 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
## Phase E — Secrets ## Phase E — Secrets
- [x] **Sealed Secrets** (optional Git workflow) — `clusters/noble/bootstrap/sealed-secrets/` (Helm **2.18.4**); **`kubeseal`** + key backup per **`README.md`** - [x] **SOPS** — encrypt **`Secret`** YAML under **`clusters/noble/secrets/`** with **age** (see **`.sops.yaml`**, **`clusters/noble/secrets/README.md`**); keep **`age-key.txt`** private (gitignored). **`ansible/playbooks/noble.yml`** decrypt-applies **`*.yaml`** when **`age-key.txt`** exists.
- [x] **Vault** in-cluster on Longhorn + **auto-unseal**`clusters/noble/bootstrap/vault/` (Helm **0.32.0**); **Longhorn** PVC; **OSS** “auto-unseal” = optional **`unseal-cronjob.yaml`** + Secret (**README**); **`configure-kubernetes-auth.sh`** for ESO (**Kubernetes auth** + KV + role)
- [x] **External Secrets Operator** + Vault `ClusterSecretStore` — operator **`clusters/noble/bootstrap/external-secrets/`** (Helm **2.2.0**); apply **`examples/vault-cluster-secret-store.yaml`** after Vault (**`README.md`**)
## Phase F — Policy + backups ## Phase F — Policy + backups
@@ -182,8 +172,7 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
## Phase G — Hardening ## Phase G — Hardening
- [x] **Cilium** Vault **`CiliumNetworkPolicy`** (`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`) — HTTP **8200** from **`external-secrets`** + **`vault`**; extend for other clients as needed - [x] **Runbooks****`talos/runbooks/`** (API VIP / kube-vip, etcdTalos, Longhorn, SOPS)
- [x] **Runbooks****`talos/runbooks/`** (API VIP / kube-vip, etcdTalos, Longhorn, Vault)
- [x] **RBAC****Headlamp** **`ClusterRoleBinding`** uses built-in **`edit`** (not **`cluster-admin`**); **Argo CD** **`policy.default: role:readonly`** with **`g, admin, role:admin`** — see **`clusters/noble/bootstrap/headlamp/values.yaml`**, **`clusters/noble/bootstrap/argocd/values.yaml`**, **`talos/runbooks/rbac.md`** - [x] **RBAC****Headlamp** **`ClusterRoleBinding`** uses built-in **`edit`** (not **`cluster-admin`**); **Argo CD** **`policy.default: role:readonly`** with **`g, admin, role:admin`** — see **`clusters/noble/bootstrap/headlamp/values.yaml`**, **`clusters/noble/bootstrap/argocd/values.yaml`**, **`talos/runbooks/rbac.md`**
- [ ] **Alertmanager** — add **`slack_configs`**, **`pagerduty_configs`**, or other receivers under **`kube-prometheus-stack`** `alertmanager.config` (chart defaults use **`null`** receiver) - [ ] **Alertmanager** — add **`slack_configs`**, **`pagerduty_configs`**, or other receivers under **`kube-prometheus-stack`** `alertmanager.config` (chart defaults use **`null`** receiver)
@@ -201,12 +190,10 @@ Lab stack is **up** on-cluster through **Phase D****F** and **Phase G** (Vaul
- [x] **`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**) - [x] **`logging`** — **Fluent Bit** DaemonSet **Running** on all nodes (logs → **Loki**)
- [x] **Grafana****Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync) - [x] **Grafana****Loki** datasource from **`grafana-loki-datasource`** ConfigMap (**Explore** works after apply + sidecar sync)
- [x] **Headlamp** — Deployment **Running** in **`headlamp`**; UI at **`https://headlamp.apps.noble.lab.pcenicni.dev`** (TLS via **`letsencrypt-prod`**) - [x] **Headlamp** — Deployment **Running** in **`headlamp`**; UI at **`https://headlamp.apps.noble.lab.pcenicni.dev`** (TLS via **`letsencrypt-prod`**)
- [x] **`sealed-secrets`** — controller **Deployment** **Running** in **`sealed-secrets`** (install + **`kubeseal`** per **`apps/sealed-secrets/README.md`**) - [x] **SOPS secrets****`clusters/noble/secrets/*.yaml`** encrypted in git; **`noble.yml`** applies decrypted manifests when **`age-key.txt`** is present
- [x] **`external-secrets`** — controller + webhook + cert-controller **Running** in **`external-secrets`**; apply **`ClusterSecretStore`** after Vault **Kubernetes auth**
- [x] **`vault`** — **StatefulSet** **Running**, **`data-vault-0`** PVC **Bound** on **longhorn**; **`vault operator init`** + unseal per **`apps/vault/README.md`**
- [x] **`kyverno`** — admission / background / cleanup / reports controllers **Running** in **`kyverno`**; **ClusterPolicies** for **PSS baseline** **Ready** (**Audit**) - [x] **`kyverno`** — admission / background / cleanup / reports controllers **Running** in **`kyverno`**; **ClusterPolicies** for **PSS baseline** **Ready** (**Audit**)
- [ ] **`velero`** — when enabled: Deployment **Running** in **`velero`**; **`BackupStorageLocation`** / **`VolumeSnapshotLocation`** **Available**; test backup per **`velero/README.md`** - [ ] **`velero`** — when enabled: Deployment **Running** in **`velero`**; **`BackupStorageLocation`** / **`VolumeSnapshotLocation`** **Available**; test backup per **`velero/README.md`**
- [x] **Phase G (partial)** Vault **`CiliumNetworkPolicy`**; **`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional - [x] **Phase G (partial)****`talos/runbooks/`** (incl. **RBAC**); **Headlamp**/**Argo CD** RBAC tightened — **Alertmanager** receivers still optional
--- ---

View File

@@ -1,7 +1,7 @@
# Talos — noble lab # Talos — noble lab
- **Cluster build checklist (exported TODO):** [CLUSTER-BUILD.md](./CLUSTER-BUILD.md) - **Cluster build checklist (exported TODO):** [CLUSTER-BUILD.md](./CLUSTER-BUILD.md)
- **Operational runbooks (API VIP, etcd, Longhorn, Vault):** [runbooks/README.md](./runbooks/README.md) - **Operational runbooks (API VIP, etcd, Longhorn, SOPS):** [runbooks/README.md](./runbooks/README.md)
## Versions ## Versions

View File

@@ -7,5 +7,5 @@ Short recovery / triage notes for the **noble** Talos cluster. Deep procedures l
| Kubernetes API VIP (kube-vip) | [`api-vip-kube-vip.md`](./api-vip-kube-vip.md) | | Kubernetes API VIP (kube-vip) | [`api-vip-kube-vip.md`](./api-vip-kube-vip.md) |
| etcd / Talos control plane | [`etcd-talos.md`](./etcd-talos.md) | | etcd / Talos control plane | [`etcd-talos.md`](./etcd-talos.md) |
| Longhorn storage | [`longhorn.md`](./longhorn.md) | | Longhorn storage | [`longhorn.md`](./longhorn.md) |
| Vault (unseal, auth, ESO) | [`vault.md`](./vault.md) | | SOPS (secrets in git) | [`sops.md`](./sops.md) |
| RBAC (Headlamp, Argo CD) | [`rbac.md`](./rbac.md) | | RBAC (Headlamp, Argo CD) | [`rbac.md`](./rbac.md) |

13
talos/runbooks/sops.md Normal file
View File

@@ -0,0 +1,13 @@
# Runbook: SOPS secrets (git-encrypted)
**Symptoms:** `sops -d` fails; `kubectl apply` after Ansible shows no secret; `noble.yml` skips apply.
**Checklist**
1. **Private key:** `age-key.txt` at the repository root (gitignored). Create with `age-keygen -o age-key.txt` and add the **public** key to `.sops.yaml` (see `clusters/noble/secrets/README.md`).
2. **Environment:** `export SOPS_AGE_KEY_FILE=/absolute/path/to/home-server/age-key.txt` when editing or applying by hand.
3. **Edit encrypted file:** `sops clusters/noble/secrets/<name>.secret.yaml`
4. **Apply one file:** `sops -d clusters/noble/secrets/<name>.secret.yaml | kubectl apply -f -`
5. **Ansible:** `noble_apply_sops_secrets` is true by default; the platform role applies all `*.yaml` when `age-key.txt` exists.
**References:** [`clusters/noble/secrets/README.md`](../../clusters/noble/secrets/README.md), [Mozilla SOPS](https://github.com/getsops/sops).

View File

@@ -1,15 +0,0 @@
# Runbook: Vault (in-cluster)
**Symptoms:** External Secrets **not syncing**, `ClusterSecretStore` **InvalidProviderConfig**, Vault UI/API **503 sealed**, pods **CrashLoop** on auth.
**Checks**
1. `kubectl -n vault exec -i sts/vault -- vault status`**Sealed** / **Initialized**.
2. Unseal key Secret + optional CronJob: [`clusters/noble/bootstrap/vault/README.md`](../../clusters/noble/bootstrap/vault/README.md), `unseal-cronjob.yaml`.
3. Kubernetes auth for ESO: [`clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh`](../../clusters/noble/bootstrap/vault/configure-kubernetes-auth.sh) and `kubectl describe clustersecretstore vault`.
4. **Cilium** policy: if Vault is unreachable from `external-secrets`, check [`clusters/noble/bootstrap/vault/cilium-network-policy.yaml`](../../clusters/noble/bootstrap/vault/cilium-network-policy.yaml) and extend `ingress` for new client namespaces.
**Common fixes**
- Sealed: `vault operator unseal` or fix auto-unseal CronJob + `vault-unseal-key` Secret.
- **403/invalid role** on ESO: re-run Kubernetes auth setup (issuer/CA/reviewer JWT) per README.