9.0 KiB
Migration plan: Proxmox VMs → noble (Kubernetes)
This document is the default playbook for moving workloads from Proxmox VMs on 192.168.1.0/24 into the noble Talos cluster on 192.168.50.0/24. Source inventory and per-VM notes: homelab-network.md. Cluster facts: architecture.md, talos/CLUSTER-BUILD.md.
1. Scope and principles
| Principle | Detail |
|---|---|
| One service at a time | Run the new workload on noble while the VM stays up; cut over DNS / NPM only after checks pass. |
| Same container image | Prefer the same upstream image and major version as Docker on the VM to reduce surprises. |
| Data moves with a plan | Backup VM volumes or export DB dumps before the first deploy to the cluster. |
| Ingress on noble | Internal apps use Traefik + *.apps.noble.lab.pcenicni.dev (or your chosen hostnames) and MetalLB (e.g. 192.168.50.211) per architecture.md. |
| Cross-VLAN | Clients on .1 reach services on .50 via routing; firewall must allow NFS from Talos node IPs to OMV 192.168.1.105 when pods mount NFS. |
Not everything must move. Keep Openmediavault (and optionally NPM) on VMs if you prefer; the cluster consumes NFS and HTTP from them.
2. Prerequisites (before wave 1)
- Cluster healthy —
kubectl get nodes;talos/CLUSTER-BUILD.mdchecklist through ingress and cert-manager as needed. - Ingress + TLS — Traefik + cert-manager working; you can hit a test Ingress on the MetalLB IP.
- GitOps / deploy path — Decide per app: Helm under
clusters/noble/apps/, Argo CD, or Ansible-applied manifests (match how you manage the rest of noble). - Secrets — Plan Kubernetes Secrets; for git-stored material, align with SOPS (
clusters/noble/secrets/,.sops.yaml). - Storage — Longhorn default for ReadWriteOnce state; for NFS (arr, Jellyfin), install a CSI NFS driver and test a small RWX PVC before migrating data-heavy apps.
- Shared data tier (recommended) — Deploy centralized PostgreSQL and S3-compatible storage on noble so apps do not each ship their own DB/object store; see
shared-data-services.md. - Firewall — Rules: workstation →
192.168.50.230:6443; nodes → OMV NFS ports; clients →192.168.50.211(or split-horizon DNS) as you design. - DNS — Split-horizon or Pi-hole records for
*.apps.noble.lab.pcenicni.dev→ Traefik IP192.168.50.211for LAN clients.
3. Standard migration procedure (repeat per app)
Use this checklist for each application (or small group, e.g. one Helm release).
| Step | Action |
|---|---|
| A. Discover | Document image:tag, ports, volumes (host paths), env vars, depends_on (DB, Redis, NFS path). Export docker inspect / compose from the VM. |
| B. Backup | Snapshot Proxmox VM or backup volume / SQLite / DB dump to offline storage. |
| C. Namespace | Create a dedicated namespace (e.g. monitoring-tools, authentik) or use your house standard. |
| D. Deploy | Add Deployment (or StatefulSet), Service, Ingress (class traefik), PVCs; wire secrets from Secrets (not literals in git). |
| E. Storage | Longhorn PVC for local state; NFS CSI PVC for shared media/config paths that must match the VM (see homelab-network.md arr section). Prefer shared Postgres / shared S3 per shared-data-services.md instead of new embedded databases. Match UID/GID with securityContext. |
| F. Smoke test | kubectl port-forward or temporary Ingress hostname; log in, run one critical workflow (login, playback, sync). |
| G. DNS cutover | Point internal DNS or NPM upstream from the VM IP to the new hostname (Traefik) or MetalLB IP + Host header. |
| H. Observe | 24–72 hours: logs, alerts, Uptime Kuma (once migrated), backups. |
| I. Decommission | Stop the container on the VM (not the whole VM until the whole VM is empty). |
| J. VM off | When no services remain on that VM, power off and archive or delete the VM. |
Rollback: Re-enable the VM service, revert DNS/NPM to the old IP, delete or scale the cluster deployment to zero.
4. Recommended migration order (phases)
Order balances risk, dependencies, and learning curve.
| Phase | Target | Rationale |
|---|---|---|
| 0 — Optional | Automate (130) | Low use: retire or replace with CronJobs; skip if nothing valuable runs. |
| 0b — Platform | Shared Postgres + S3 on noble | Run before or alongside early waves so new deploys use one DSN and one object endpoint; retire VM 160 when empty. See shared-data-services.md. |
| 1 — Observability | Monitor (110) — Uptime Kuma, Peekaping, Tracearr | Small state, validates Ingress, PVCs, and alert paths before auth and media. |
| 2 — Git | gitea (300), gitea-nsfw (310) | Point at shared Postgres + S3 for attachments; move repos with PVC + backup restore if needed. |
| 3 — Object / misc | s3 (160), AMP (500) | Migrate data into central S3 on cluster, then decommission duplicate MinIO on VM 160 if applicable. |
| 4 — Auth | Auth (190) — Authentik | Use shared Postgres; update all OIDC clients (Gitea, apps, NPM) with new issuer URLs; schedule a maintenance window. |
| 5 — Daily apps | general-purpose (140) | Move one app per release (Mealie, Open WebUI, …); each app gets its own database (and bucket if needed) on the shared tiers — not a new Postgres pod per app. |
| 6 — Media / arr | arr (120), Media-server (150) | NFS from OMV, download clients, transcoding — migrate one arr then Jellyfin/ebook; see NFS bullets in homelab-network.md. |
| 7 — Edge | NPM (666/777) | Often last: either keep on Proxmox or replace with Traefik + IngressRoutes / Gateway API; many people keep a dedicated reverse proxy VM until parity is proven. |
Openmediavault (100) — Typically stays as NFS (and maybe backup target) for the cluster; no need to “migrate” the whole NAS into Kubernetes.
5. Ingress and reverse proxy
| Approach | When to use |
|---|---|
| Traefik Ingress on noble | Default for internal HTTPS apps; cert-manager for public names you control. |
| NPM (VM) as front door | Point proxy host → Traefik MetalLB IP or service name if you add internal DNS; reduces double-proxy if you terminate TLS in one place only. |
| Newt / Pangolin | Public reachability per clusters/noble/bootstrap/newt/README.md; not automatic ExternalDNS. |
Avoid two TLS terminations for the same hostname unless you intend SSL passthrough end-to-end.
6. Authentik-specific (Auth VM → cluster)
- Backup Authentik PostgreSQL (or embedded DB) and media volume from the VM.
- Deploy Helm (official chart) with same Authentik version if possible.
- Restore DB into shared cluster Postgres (recommended) or chart-managed DB — see
shared-data-services.md. - Update issuer URL in every OIDC/OAuth client (Gitea, Grafana, etc.).
- Re-test outposts (if any) and redirect URIs from both
.1and.50client perspectives. - Cut over DNS; then decommission VM 190.
7. arr and Jellyfin-specific
Follow the numbered list under “Arr stack, NFS, and Kubernetes” in homelab-network.md. In short: OMV stays; CSI NFS + RWX; match permissions; migrate one app first; verify download client can reach the new pod IP/DNS from your download host.
8. Validation checklist (per wave)
- Pods Ready, Ingress returns 200 / login page.
- TLS valid for chosen hostname.
- Persistent data present (new uploads, DB writes survive pod restart).
- Backups (Velero or app-level) defined for the new location.
- Monitoring / alerts updated (targets, not old VM IP).
- Documentation in
homelab-network.mdupdated (VM retired or marked migrated).
Related docs
- Shared Postgres + S3:
shared-data-services.md - VM inventory and NFS notes:
homelab-network.md - Noble topology, MetalLB, Traefik:
architecture.md - Bootstrap and versions:
talos/CLUSTER-BUILD.md - Apps layout:
clusters/noble/apps/README.md