Update Ansible configuration to integrate SOPS for managing secrets. Enhance README.md with SOPS usage instructions and prerequisites. Remove External Secrets Operator references and related configurations from the bootstrap process, streamlining the deployment. Adjust playbooks and roles to apply SOPS-encrypted secrets automatically, improving security and clarity in secret management.

2026-03-30 22:42:52 -04:00
parent 023ebfee5d
commit 3a6e5dff5b
44 changed files with 644 additions and 809 deletions
--- a/docs/shared-data-services.md
+++ b/docs/shared-data-services.md
@@ -0,0 +1,90 @@
+# Centralized PostgreSQL and S3-compatible storage
+
+Goal: **one shared PostgreSQL** and **one S3-compatible object store** on **noble**, instead of every app bundling its own database or MinIO. Apps keep **logical isolation** via **per-app databases** / **users** and **per-app buckets** (or prefixes), not separate clusters.
+
+See also: [`migration-vm-to-noble.md`](migration-vm-to-noble.md), [`homelab-network.md`](homelab-network.md) (VM **160** `s3` today), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) (Velero + S3).
+
+---
+
+## 1. Why centralize
+
+| Benefit | Detail |
+|--------|--------|
+| **Operations** | One backup/restore story, one upgrade cadence, one place to tune **IOPS** and **retention**. |
+| **Security** | **Least privilege**: each app gets its own **DB user** and **S3 credentials** scoped to one database or bucket. |
+| **Resources** | Fewer duplicate **Postgres** or **MinIO** sidecars; better use of **Longhorn** or dedicated PVCs for the shared tiers. |
+
+**Tradeoff:** Shared tiers are **blast-radius** targets — use **backups**, **PITR** where you care, and **NetworkPolicies** so only expected namespaces talk to Postgres/S3.
+
+---
+
+## 2. PostgreSQL — recommended pattern
+
+1. **Run Postgres on noble** — Operators such as **CloudNativePG**, **Zalando Postgres operator**, or a well-maintained **Helm** chart with **replicas** + **persistent volumes** (Longhorn).
+2. **One cluster instance, many databases** — For each app: `CREATE DATABASE appname;` and a **dedicated role** with `CONNECT` on that database only (not superuser).
+3. **Connection from apps** — Use a **Kubernetes Service** (e.g. `postgres-platform.platform.svc.cluster.local:5432`) and pass **credentials** via **Secrets** (ideally **SOPS**-encrypted in git).
+4. **Migrations** — Run app **migration** jobs or init containers against the **same** DSN after DB exists.
+
+**Migrating off SQLite / embedded Postgres**
+
+- **SQLite → Postgres:** export/import per app (native tools, or **pgloader** where appropriate).
+- **Docker Postgres volume:** `pg_dumpall` or per-DB `pg_dump` → restore into a **new** database on the shared server; **freeze writes** during cutover.
+
+---
+
+## 3. S3-compatible object storage — recommended pattern
+
+1. **Run one S3 API on noble** — **MinIO** (common), **Garage**, or **SeaweedFS** S3 layer — with **PVC(s)** or host path for data; **erasure coding** / replicas if the chart supports it and you want durability across nodes.
+2. **Buckets per concern** — e.g. `gitea-attachments`, `velero`, `loki-archive` — not one global bucket unless you enforce **prefix** IAM policies.
+3. **Credentials** — **IAM-style** users limited to **one bucket** (or prefix); **Secrets** reference **access key** / **secret**; never commit keys in plain text.
+4. **Endpoint for pods** — In-cluster: `http://minio.platform.svc.cluster.local:9000` (or TLS inside mesh). Apps use **virtual-hosted** or **path-style** per SDK defaults.
+
+### NFS as backing store for S3 on noble
+
+**Yes.** You can run MinIO (or another S3-compatible server) with its **data directory** on a **ReadWriteMany** volume that is **NFS** — for example the same **Openmediavault** export you already use, mounted via your **NFS CSI** driver (see [`homelab-network.md`](homelab-network.md)).
+
+| Consideration | Detail |
+|---------------|--------|
+| **Works for homelab** | MinIO stores objects as files under a path; **POSIX** on NFS is enough for many setups. |
+| **Performance** | NFS adds **latency** and shared bandwidth; fine for moderate use, less ideal for heavy multi-tenant throughput. |
+| **Availability** | The **NFS server** (OMV) becomes part of the availability story for object data — plan **backups** and **OMV** health like any dependency. |
+| **Locking / semantics** | Prefer **NFSv4.x**; avoid mixing **NFS** and expectations of **local SSD** (e.g. very chatty small writes). If you see odd behavior, **Longhorn** (block) on a node is the usual next step. |
+| **Layering** | You are stacking **S3 API → file layout → NFS → disk**; that is normal for a lab, just **monitor** space and exports on OMV. |
+
+**Summary:** NFS-backed PVC for MinIO is **valid** on noble; use **Longhorn** (or local disk) when you need **better IOPS** or want object data **inside** the cluster’s storage domain without depending on OMV for that tier.
+
+**Migrating off VM 160 (`s3`) or per-app MinIO**
+
+- **MinIO → MinIO:** `mc mirror` between aliases, or **replication** if you configure it.
+- **Same API:** Any tool speaking **S3** can **sync** buckets before you point apps at the new endpoint.
+
+**Velero** — Point the **backup location** at the **central** bucket (see cluster Velero docs); avoid a second ad-hoc object store for backups if one cluster bucket is enough.
+
+---
+
+## 4. Ordering relative to app migrations
+
+| When | What |
+|------|------|
+| **Early** | Stand up **Postgres** + **S3** with **empty** DBs/buckets; test with **one** non-critical app (e.g. a throwaway deployment). |
+| **Before auth / Git** | **Gitea** and **Authentik** benefit from **managed Postgres** early — plan **DSN** and **bucket** for attachments **before** cutover. |
+| **Ongoing** | New apps **must not** ship embedded **Postgres/MinIO** unless the workload truly requires it (e.g. vendor appliance). |
+
+---
+
+## 5. Checklist (platform team)
+
+- [ ] Postgres **Service** DNS name and **TLS** (optional in-cluster) documented.
+- [ ] S3 **endpoint**, **region** string (can be `us-east-1` for MinIO), **TLS** for Ingress if clients are outside the cluster.
+- [ ] **Backup:** scheduled **logical dumps** (Postgres) and **bucket replication** or **object versioning** where needed.
+- [ ] **SOPS** / **External Secrets** pattern for **rotation** without editing app manifests by hand.
+- [ ] **homelab-network.md** updated when **VM 160** is retired or repurposed.
+
+---
+
+## Related docs
+
+- VM → cluster migration: [`migration-vm-to-noble.md`](migration-vm-to-noble.md)
+- Inventory (s3 VM): [`homelab-network.md`](homelab-network.md)
+- Longhorn / storage runbook: [`../talos/runbooks/longhorn.md`](../talos/runbooks/longhorn.md)
+- Velero (S3 backup target): [`../clusters/noble/bootstrap/velero/`](../clusters/noble/bootstrap/velero/) (if present)