Migration plan: Proxmox VMs → noble (Kubernetes)

This document is the default playbook for moving workloads from Proxmox VMs on 192.168.1.0/24 into the noble Talos cluster on 192.168.50.0/24. Source inventory and per-VM notes: homelab-network.md. Cluster facts: architecture.md, talos/CLUSTER-BUILD.md.

1. Scope and principles

Principle	Detail
One service at a time	Run the new workload on noble while the VM stays up; cut over DNS / NPM only after checks pass.
Same container image	Prefer the same upstream image and major version as Docker on the VM to reduce surprises.
Data moves with a plan	Backup VM volumes or export DB dumps before the first deploy to the cluster.
Ingress on noble	Internal apps use Traefik + *`.apps.noble.lab.pcenicni.dev` (or your chosen hostnames) and MetalLB (e.g. `192.168.50.211`**) per `architecture.md`.
Cross-VLAN	Clients on `.1` reach services on `.50` via routing; firewall must allow NFS from Talos node IPs to OMV `192.168.1.105` when pods mount NFS.

Not everything must move. Keep Openmediavault (and optionally NPM) on VMs if you prefer; the cluster consumes NFS and HTTP from them.

2. Prerequisites (before wave 1)

Cluster healthy — kubectl get nodes; talos/CLUSTER-BUILD.md checklist through ingress and cert-manager as needed.
Ingress + TLS — Traefik + cert-manager working; you can hit a test Ingress on the MetalLB IP.
GitOps / deploy path — Decide per app: Helm under clusters/noble/apps/, Argo CD, or Ansible-applied manifests (match how you manage the rest of noble).
Secrets — Plan Kubernetes Secrets; for git-stored material, align with SOPS (clusters/noble/secrets/, .sops.yaml).
Storage — Longhorn default for ReadWriteOnce state; for NFS (arr, Jellyfin), install a CSI NFS driver and test a small RWX PVC before migrating data-heavy apps.
Shared data tier (recommended) — Deploy centralized PostgreSQL and S3-compatible storage on noble so apps do not each ship their own DB/object store; see shared-data-services.md.
Firewall — Rules: workstation → 192.168.50.230:6443; nodes → OMV NFS ports; clients → 192.168.50.211 (or split-horizon DNS) as you design.
DNS — Split-horizon or Pi-hole records for *.apps.noble.lab.pcenicni.dev → Traefik IP 192.168.50.211 for LAN clients.

3. Standard migration procedure (repeat per app)

Use this checklist for each application (or small group, e.g. one Helm release).

Step	Action
A. Discover	Document image:tag, ports, volumes (host paths), env vars, depends_on (DB, Redis, NFS path). Export docker inspect / compose from the VM.
B. Backup	Snapshot Proxmox VM or backup volume / SQLite / DB dump to offline storage.
C. Namespace	Create a dedicated namespace (e.g. `monitoring-tools`, `authentik`) or use your house standard.
D. Deploy	Add Deployment (or StatefulSet), Service, Ingress (class traefik), PVCs; wire secrets from Secrets (not literals in git).
E. Storage	Longhorn PVC for local state; NFS CSI PVC for shared media/config paths that must match the VM (see `homelab-network.md` arr section). Prefer shared Postgres / shared S3 per `shared-data-services.md` instead of new embedded databases. Match UID/GID with `securityContext`.
F. Smoke test	`kubectl port-forward` or temporary Ingress hostname; log in, run one critical workflow (login, playback, sync).
G. DNS cutover	Point internal DNS or NPM upstream from the VM IP to the new hostname (Traefik) or MetalLB IP + Host header.
H. Observe	24–72 hours: logs, alerts, Uptime Kuma (once migrated), backups.
I. Decommission	Stop the container on the VM (not the whole VM until the whole VM is empty).
J. VM off	When no services remain on that VM, power off and archive or delete the VM.

Rollback: Re-enable the VM service, revert DNS/NPM to the old IP, delete or scale the cluster deployment to zero.

4. Recommended migration order (phases)

Order balances risk, dependencies, and learning curve.

Phase	Target	Rationale
0 — Optional	Automate (130)	Low use: retire or replace with CronJobs; skip if nothing valuable runs.
0b — Platform	Shared Postgres + S3 on noble	Run before or alongside early waves so new deploys use one DSN and one object endpoint; retire VM 160 when empty. See `shared-data-services.md`.
1 — Observability	Monitor (110) — Uptime Kuma, Peekaping, Tracearr	Small state, validates Ingress, PVCs, and alert paths before auth and media.
2 — Git	gitea (300), gitea-nsfw (310)	Point at shared Postgres + S3 for attachments; move repos with PVC + backup restore if needed.
3 — Object / misc	s3 (160), AMP (500)	Migrate data into central S3 on cluster, then decommission duplicate MinIO on VM 160 if applicable.
4 — Auth	Auth (190) — Authentik	Use shared Postgres; update all OIDC clients (Gitea, apps, NPM) with new issuer URLs; schedule a maintenance window.
5 — Daily apps	general-purpose (140)	Move one app per release (Mealie, Open WebUI, …); each app gets its own database (and bucket if needed) on the shared tiers — not a new Postgres pod per app.
6 — Media / arr	arr (120), Media-server (150)	NFS from OMV, download clients, transcoding — migrate one arr then Jellyfin/ebook; see NFS bullets in `homelab-network.md`.
7 — Edge	NPM (666/777)	Often last: either keep on Proxmox or replace with Traefik + IngressRoutes / Gateway API; many people keep a dedicated reverse proxy VM until parity is proven.

Openmediavault (100) — Typically stays as NFS (and maybe backup target) for the cluster; no need to “migrate” the whole NAS into Kubernetes.

5. Ingress and reverse proxy

Approach	When to use
Traefik Ingress on noble	Default for internal HTTPS apps; cert-manager for public names you control.
NPM (VM) as front door	Point proxy host → Traefik MetalLB IP or service name if you add internal DNS; reduces double-proxy if you terminate TLS in one place only.
Newt / Pangolin	Public reachability per `clusters/noble/bootstrap/newt/README.md`; not automatic ExternalDNS.

Avoid two TLS terminations for the same hostname unless you intend SSL passthrough end-to-end.

6. Authentik-specific (Auth VM → cluster)

Backup Authentik PostgreSQL (or embedded DB) and media volume from the VM.
Deploy Helm (official chart) with same Authentik version if possible.
Restore DB into shared cluster Postgres (recommended) or chart-managed DB — see shared-data-services.md.
Update issuer URL in every OIDC/OAuth client (Gitea, Grafana, etc.).
Re-test outposts (if any) and redirect URIs from both .1 and .50 client perspectives.
Cut over DNS; then decommission VM 190.

7. arr and Jellyfin-specific

Follow the numbered list under “Arr stack, NFS, and Kubernetes” in homelab-network.md. In short: OMV stays; CSI NFS + RWX; match permissions; migrate one app first; verify download client can reach the new pod IP/DNS from your download host.

8. Validation checklist (per wave)

Pods Ready, Ingress returns 200 / login page.
TLS valid for chosen hostname.
Persistent data present (new uploads, DB writes survive pod restart).
Backups (Velero or app-level) defined for the new location.
Monitoring / alerts updated (targets, not old VM IP).
Documentation in homelab-network.md updated (VM retired or marked migrated).

Shared Postgres + S3: shared-data-services.md
VM inventory and NFS notes: homelab-network.md
Noble topology, MetalLB, Traefik: architecture.md
Bootstrap and versions: talos/CLUSTER-BUILD.md
Apps layout: clusters/noble/apps/README.md

9.0 KiB Raw Blame History Unescape Escape