Files
home-server/docs/migration-vm-to-noble.md

9.0 KiB
Raw Blame History

Migration plan: Proxmox VMs → noble (Kubernetes)

This document is the default playbook for moving workloads from Proxmox VMs on 192.168.1.0/24 into the noble Talos cluster on 192.168.50.0/24. Source inventory and per-VM notes: homelab-network.md. Cluster facts: architecture.md, talos/CLUSTER-BUILD.md.


1. Scope and principles

Principle Detail
One service at a time Run the new workload on noble while the VM stays up; cut over DNS / NPM only after checks pass.
Same container image Prefer the same upstream image and major version as Docker on the VM to reduce surprises.
Data moves with a plan Backup VM volumes or export DB dumps before the first deploy to the cluster.
Ingress on noble Internal apps use Traefik + *.apps.noble.lab.pcenicni.dev (or your chosen hostnames) and MetalLB (e.g. 192.168.50.211) per architecture.md.
Cross-VLAN Clients on .1 reach services on .50 via routing; firewall must allow NFS from Talos node IPs to OMV 192.168.1.105 when pods mount NFS.

Not everything must move. Keep Openmediavault (and optionally NPM) on VMs if you prefer; the cluster consumes NFS and HTTP from them.


2. Prerequisites (before wave 1)

  1. Cluster healthykubectl get nodes; talos/CLUSTER-BUILD.md checklist through ingress and cert-manager as needed.
  2. Ingress + TLSTraefik + cert-manager working; you can hit a test Ingress on the MetalLB IP.
  3. GitOps / deploy path — Decide per app: Helm under clusters/noble/apps/, Argo CD, or Ansible-applied manifests (match how you manage the rest of noble).
  4. Secrets — Plan Kubernetes Secrets; for git-stored material, align with SOPS (clusters/noble/secrets/, .sops.yaml).
  5. StorageLonghorn default for ReadWriteOnce state; for NFS (arr, Jellyfin), install a CSI NFS driver and test a small RWX PVC before migrating data-heavy apps.
  6. Shared data tier (recommended) — Deploy centralized PostgreSQL and S3-compatible storage on noble so apps do not each ship their own DB/object store; see shared-data-services.md.
  7. Firewall — Rules: workstation → 192.168.50.230:6443; nodes → OMV NFS ports; clients → 192.168.50.211 (or split-horizon DNS) as you design.
  8. DNS — Split-horizon or Pi-hole records for *.apps.noble.lab.pcenicni.devTraefik IP 192.168.50.211 for LAN clients.

3. Standard migration procedure (repeat per app)

Use this checklist for each application (or small group, e.g. one Helm release).

Step Action
A. Discover Document image:tag, ports, volumes (host paths), env vars, depends_on (DB, Redis, NFS path). Export docker inspect / compose from the VM.
B. Backup Snapshot Proxmox VM or backup volume / SQLite / DB dump to offline storage.
C. Namespace Create a dedicated namespace (e.g. monitoring-tools, authentik) or use your house standard.
D. Deploy Add Deployment (or StatefulSet), Service, Ingress (class traefik), PVCs; wire secrets from Secrets (not literals in git).
E. Storage Longhorn PVC for local state; NFS CSI PVC for shared media/config paths that must match the VM (see homelab-network.md arr section). Prefer shared Postgres / shared S3 per shared-data-services.md instead of new embedded databases. Match UID/GID with securityContext.
F. Smoke test kubectl port-forward or temporary Ingress hostname; log in, run one critical workflow (login, playback, sync).
G. DNS cutover Point internal DNS or NPM upstream from the VM IP to the new hostname (Traefik) or MetalLB IP + Host header.
H. Observe 2472 hours: logs, alerts, Uptime Kuma (once migrated), backups.
I. Decommission Stop the container on the VM (not the whole VM until the whole VM is empty).
J. VM off When no services remain on that VM, power off and archive or delete the VM.

Rollback: Re-enable the VM service, revert DNS/NPM to the old IP, delete or scale the cluster deployment to zero.


Order balances risk, dependencies, and learning curve.

Phase Target Rationale
0 — Optional Automate (130) Low use: retire or replace with CronJobs; skip if nothing valuable runs.
0b — Platform Shared Postgres + S3 on noble Run before or alongside early waves so new deploys use one DSN and one object endpoint; retire VM 160 when empty. See shared-data-services.md.
1 — Observability Monitor (110) — Uptime Kuma, Peekaping, Tracearr Small state, validates Ingress, PVCs, and alert paths before auth and media.
2 — Git gitea (300), gitea-nsfw (310) Point at shared Postgres + S3 for attachments; move repos with PVC + backup restore if needed.
3 — Object / misc s3 (160), AMP (500) Migrate data into central S3 on cluster, then decommission duplicate MinIO on VM 160 if applicable.
4 — Auth Auth (190)Authentik Use shared Postgres; update all OIDC clients (Gitea, apps, NPM) with new issuer URLs; schedule a maintenance window.
5 — Daily apps general-purpose (140) Move one app per release (Mealie, Open WebUI, …); each app gets its own database (and bucket if needed) on the shared tiers — not a new Postgres pod per app.
6 — Media / arr arr (120), Media-server (150) NFS from OMV, download clients, transcoding — migrate one arr then Jellyfin/ebook; see NFS bullets in homelab-network.md.
7 — Edge NPM (666/777) Often last: either keep on Proxmox or replace with Traefik + IngressRoutes / Gateway API; many people keep a dedicated reverse proxy VM until parity is proven.

Openmediavault (100) — Typically stays as NFS (and maybe backup target) for the cluster; no need to “migrate” the whole NAS into Kubernetes.


5. Ingress and reverse proxy

Approach When to use
Traefik Ingress on noble Default for internal HTTPS apps; cert-manager for public names you control.
NPM (VM) as front door Point proxy hostTraefik MetalLB IP or service name if you add internal DNS; reduces double-proxy if you terminate TLS in one place only.
Newt / Pangolin Public reachability per clusters/noble/bootstrap/newt/README.md; not automatic ExternalDNS.

Avoid two TLS terminations for the same hostname unless you intend SSL passthrough end-to-end.


6. Authentik-specific (Auth VM → cluster)

  1. Backup Authentik PostgreSQL (or embedded DB) and media volume from the VM.
  2. Deploy Helm (official chart) with same Authentik version if possible.
  3. Restore DB into shared cluster Postgres (recommended) or chart-managed DB — see shared-data-services.md.
  4. Update issuer URL in every OIDC/OAuth client (Gitea, Grafana, etc.).
  5. Re-test outposts (if any) and redirect URIs from both .1 and .50 client perspectives.
  6. Cut over DNS; then decommission VM 190.

7. arr and Jellyfin-specific

Follow the numbered list under “Arr stack, NFS, and Kubernetes” in homelab-network.md. In short: OMV stays; CSI NFS + RWX; match permissions; migrate one app first; verify download client can reach the new pod IP/DNS from your download host.


8. Validation checklist (per wave)

  • Pods Ready, Ingress returns 200 / login page.
  • TLS valid for chosen hostname.
  • Persistent data present (new uploads, DB writes survive pod restart).
  • Backups (Velero or app-level) defined for the new location.
  • Monitoring / alerts updated (targets, not old VM IP).
  • Documentation in homelab-network.md updated (VM retired or marked migrated).