122 lines
9.0 KiB
Markdown
122 lines
9.0 KiB
Markdown
# Migration plan: Proxmox VMs → noble (Kubernetes)
|
||
|
||
This document is the **default playbook** for moving workloads from **Proxmox VMs** on **`192.168.1.0/24`** into the **noble** Talos cluster on **`192.168.50.0/24`**. Source inventory and per-VM notes: [`homelab-network.md`](homelab-network.md). Cluster facts: [`architecture.md`](architecture.md), [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md).
|
||
|
||
---
|
||
|
||
## 1. Scope and principles
|
||
|
||
| Principle | Detail |
|
||
|-----------|--------|
|
||
| **One service at a time** | Run the new workload on **noble** while the **VM** stays up; cut over **DNS / NPM** only after checks pass. |
|
||
| **Same container image** | Prefer the **same** upstream image and major version as Docker on the VM to reduce surprises. |
|
||
| **Data moves with a plan** | **Backup** VM volumes or export DB dumps **before** the first deploy to the cluster. |
|
||
| **Ingress on noble** | Internal apps use **Traefik** + **`*.apps.noble.lab.pcenicni.dev`** (or your chosen hostnames) and **MetalLB** (e.g. **`192.168.50.211`**) per [`architecture.md`](architecture.md). |
|
||
| **Cross-VLAN** | Clients on **`.1`** reach services on **`.50`** via **routing**; **firewall** must allow **NFS** from **Talos node IPs** to **OMV `192.168.1.105`** when pods mount NFS. |
|
||
|
||
**Not everything must move.** Keep **Openmediavault** (and optionally **NPM**) on VMs if you prefer; the cluster consumes **NFS** and **HTTP** from them.
|
||
|
||
---
|
||
|
||
## 2. Prerequisites (before wave 1)
|
||
|
||
1. **Cluster healthy** — `kubectl get nodes`; [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md) checklist through ingress and cert-manager as needed.
|
||
2. **Ingress + TLS** — **Traefik** + **cert-manager** working; you can hit a **test Ingress** on the MetalLB IP.
|
||
3. **GitOps / deploy path** — Decide per app: **Helm** under `clusters/noble/apps/`, **Argo CD**, or **Ansible**-applied manifests (match how you manage the rest of noble).
|
||
4. **Secrets** — Plan **Kubernetes Secrets**; for git-stored material, align with **SOPS** (`clusters/noble/secrets/`, `.sops.yaml`).
|
||
5. **Storage** — **Longhorn** default for **ReadWriteOnce** state; for **NFS** (*arr*, Jellyfin), install a **CSI NFS** driver and test a **small RWX PVC** before migrating data-heavy apps.
|
||
6. **Shared data tier (recommended)** — Deploy **centralized PostgreSQL** and **S3-compatible storage** on noble so apps do not each ship their own DB/object store; see [`shared-data-services.md`](shared-data-services.md).
|
||
7. **Firewall** — Rules: **workstation → `192.168.50.230:6443`**; **nodes → OMV NFS ports**; **clients → `192.168.50.211`** (or split-horizon DNS) as you design.
|
||
8. **DNS** — Split-horizon or Pi-hole records for **`*.apps.noble.lab.pcenicni.dev`** → **Traefik** IP **`192.168.50.211`** for LAN clients.
|
||
|
||
---
|
||
|
||
## 3. Standard migration procedure (repeat per app)
|
||
|
||
Use this checklist for **each** application (or small group, e.g. one Helm release).
|
||
|
||
| Step | Action |
|
||
|------|--------|
|
||
| **A. Discover** | Document **image:tag**, **ports**, **volumes** (host paths), **env vars**, **depends_on** (DB, Redis, NFS path). Export **docker inspect** / **compose** from the VM. |
|
||
| **B. Backup** | Snapshot **Proxmox VM** or backup **volume** / **SQLite** / **DB dump** to offline storage. |
|
||
| **C. Namespace** | Create a **dedicated namespace** (e.g. `monitoring-tools`, `authentik`) or use your house standard. |
|
||
| **D. Deploy** | Add **Deployment** (or **StatefulSet**), **Service**, **Ingress** (class **traefik**), **PVCs**; wire **secrets** from **Secrets** (not literals in git). |
|
||
| **E. Storage** | **Longhorn** PVC for local state; **NFS CSI** PVC for shared media/config paths that must match the VM (see [`homelab-network.md`](homelab-network.md) *arr* section). Prefer **shared Postgres** / **shared S3** per [`shared-data-services.md`](shared-data-services.md) instead of new embedded databases. Match **UID/GID** with `securityContext`. |
|
||
| **F. Smoke test** | `kubectl port-forward` or temporary **Ingress** hostname; log in, run one critical workflow (login, playback, sync). |
|
||
| **G. DNS cutover** | Point **internal DNS** or **NPM** upstream from the **VM IP** to the **new hostname** (Traefik) or **MetalLB IP** + Host header. |
|
||
| **H. Observe** | 24–72 hours: logs, alerts, **Uptime Kuma** (once migrated), backups. |
|
||
| **I. Decommission** | Stop the **container** on the VM (not the whole VM until the **whole** VM is empty). |
|
||
| **J. VM off** | When **no** services remain on that VM, **power off** and archive or delete the VM. |
|
||
|
||
**Rollback:** Re-enable the VM service, revert **DNS/NPM** to the old IP, delete or scale the cluster deployment to zero.
|
||
|
||
---
|
||
|
||
## 4. Recommended migration order (phases)
|
||
|
||
Order balances **risk**, **dependencies**, and **learning curve**.
|
||
|
||
| Phase | Target | Rationale |
|
||
|-------|--------|-----------|
|
||
| **0 — Optional** | **Automate (130)** | Low use: **retire** or replace with **CronJobs**; skip if nothing valuable runs. |
|
||
| **0b — Platform** | **Shared Postgres + S3** on noble | Run **before** or alongside early waves so new deploys use **one DSN** and **one object endpoint**; retire **VM 160** when empty. See [`shared-data-services.md`](shared-data-services.md). |
|
||
| **1 — Observability** | **Monitor (110)** — Uptime Kuma, Peekaping, Tracearr | Small state, validates **Ingress**, **PVCs**, and **alert paths** before auth and media. |
|
||
| **2 — Git** | **gitea (300)**, **gitea-nsfw (310)** | Point at **shared Postgres** + **S3** for attachments; move **repos** with **PVC** + backup restore if needed. |
|
||
| **3 — Object / misc** | **s3 (160)**, **AMP (500)** | **Migrate data** into **central** S3 on cluster, then **decommission** duplicate MinIO on VM **160** if applicable. |
|
||
| **4 — Auth** | **Auth (190)** — **Authentik** | Use **shared Postgres**; update **all OIDC clients** (Gitea, apps, NPM) with **new issuer URLs**; schedule a **maintenance window**. |
|
||
| **5 — Daily apps** | **general-purpose (140)** | Move **one app per release** (Mealie, Open WebUI, …); each app gets its **own database** (and bucket if needed) on the **shared** tiers — not a new Postgres pod per app. |
|
||
| **6 — Media / *arr*** | **arr (120)**, **Media-server (150)** | **NFS** from **OMV**, download clients, **transcoding** — migrate **one *arr*** then Jellyfin/ebook; see NFS bullets in [`homelab-network.md`](homelab-network.md). |
|
||
| **7 — Edge** | **NPM (666/777)** | Often **last**: either keep on Proxmox or replace with **Traefik** + **IngressRoutes** / **Gateway API**; many people keep a **dedicated** reverse proxy VM until parity is proven. |
|
||
|
||
**Openmediavault (100)** — Typically **stays** as **NFS** (and maybe backup target) for the cluster; no need to “migrate” the whole NAS into Kubernetes.
|
||
|
||
---
|
||
|
||
## 5. Ingress and reverse proxy
|
||
|
||
| Approach | When to use |
|
||
|----------|-------------|
|
||
| **Traefik Ingress** on noble | Default for **internal** HTTPS apps; **cert-manager** for public names you control. |
|
||
| **NPM (VM)** as front door | Point **proxy host** → **Traefik MetalLB IP** or **service name** if you add internal DNS; reduces double-proxy if you **terminate TLS** in one place only. |
|
||
| **Newt / Pangolin** | Public reachability per [`clusters/noble/bootstrap/newt/README.md`](../clusters/noble/bootstrap/newt/README.md); not automatic ExternalDNS. |
|
||
|
||
Avoid **two** TLS terminations for the same hostname unless you intend **SSL passthrough** end-to-end.
|
||
|
||
---
|
||
|
||
## 6. Authentik-specific (Auth VM → cluster)
|
||
|
||
1. **Backup** Authentik **PostgreSQL** (or embedded DB) and **media** volume from the VM.
|
||
2. Deploy **Helm** (official chart) with **same** Authentik version if possible.
|
||
3. **Restore** DB into **shared cluster Postgres** (recommended) or chart-managed DB — see [`shared-data-services.md`](shared-data-services.md).
|
||
4. Update **issuer URL** in every **OIDC/OAuth** client (Gitea, Grafana, etc.).
|
||
5. Re-test **outposts** (if any) and **redirect URIs** from both **`.1`** and **`.50`** client perspectives.
|
||
6. **Cut over DNS**; then **decommission** VM **190**.
|
||
|
||
---
|
||
|
||
## 7. *arr* and Jellyfin-specific
|
||
|
||
Follow the **numbered list** under **“Arr stack, NFS, and Kubernetes”** in [`homelab-network.md`](homelab-network.md). In short: **OMV stays**; **CSI NFS** + **RWX**; **match permissions**; migrate **one app** first; verify **download client** can reach the new pod **IP/DNS** from your download host.
|
||
|
||
---
|
||
|
||
## 8. Validation checklist (per wave)
|
||
|
||
- Pods **Ready**, **Ingress** returns **200** / login page.
|
||
- **TLS** valid for chosen hostname.
|
||
- **Persistent data** present (new uploads, DB writes survive pod restart).
|
||
- **Backups** (Velero or app-level) defined for the new location.
|
||
- **Monitoring** / alerts updated (targets, not old VM IP).
|
||
- **Documentation** in [`homelab-network.md`](homelab-network.md) updated (VM retired or marked migrated).
|
||
|
||
---
|
||
|
||
## Related docs
|
||
|
||
- **Shared Postgres + S3:** [`shared-data-services.md`](shared-data-services.md)
|
||
- VM inventory and NFS notes: [`homelab-network.md`](homelab-network.md)
|
||
- Noble topology, MetalLB, Traefik: [`architecture.md`](architecture.md)
|
||
- Bootstrap and versions: [`talos/CLUSTER-BUILD.md`](../talos/CLUSTER-BUILD.md)
|
||
- Apps layout: [`clusters/noble/apps/README.md`](../clusters/noble/apps/README.md)
|