Files
home-server/clusters/noble/bootstrap/velero/README.md

119 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Velero (cluster backups)
Ansible-managed core stack — **not** reconciled by Argo CD (`clusters/noble/apps` is optional GitOps only).
## What you get
- **No web UI** — Velero is operated with the **`velero`** CLI and **`kubectl`** (Backup, Schedule, Restore CRDs). Metrics are exposed for Prometheus; there is no first-party dashboard in this chart.
- **vmware-tanzu/velero** Helm chart (**12.0.0** → Velero **1.18.0**) in namespace **`velero`**
- **AWS plugin** init container for **S3-compatible** object storage (`velero/velero-plugin-for-aws:v1.14.0`)
- **CSI snapshots** via Veleros built-in CSI support (`EnableCSI`) and **VolumeSnapshotLocation** `velero.io/csi` (no separate CSI plugin image for Velero ≥ 1.14)
- **Prometheus** scraping: **ServiceMonitor** labeled for **kube-prometheus** (`release: kube-prometheus`)
- **Schedule** **`velero-daily-noble`**: cron **`0 3 * * *`** (daily at 03:00 in the Velero pods timezone, usually **UTC**), **720h** TTL per backup (~30 days). Edit **`values.yaml`** `schedules` to change time or retention.
## Prerequisites
1. **Volume Snapshot APIs** installed cluster-wide — **`clusters/noble/bootstrap/csi-snapshot-controller/`** (Ansible **`noble_csi_snapshot_controller`**, after **Cilium**). Without **`snapshot.storage.k8s.io`** CRDs and **`kube-system/snapshot-controller`**, Velero logs errors like `no matches for kind "VolumeSnapshot"`.
2. **Longhorn** (or another CSI driver) with a **VolumeSnapshotClass** for that driver.
3. For **Longhorn**, this repo applies **`velero/longhorn-volumesnapshotclass.yaml`** (`VolumeSnapshotClass` **`longhorn-velero`**, driver **`driver.longhorn.io`**, Velero label). It is included in **`clusters/noble/bootstrap/kustomization.yaml`** (same apply as other bootstrap YAML). For non-Longhorn drivers, add a class with **`velero.io/csi-volumesnapshot-class: "true"`** (see [Velero CSI](https://velero.io/docs/main/csi/)).
4. **S3-compatible** endpoint (MinIO, VersityGW, AWS, etc.) and a **bucket**.
## Credentials Secret
Velero expects **`velero/velero-cloud-credentials`**, key **`cloud`**, in **INI** form for the AWS plugin:
```ini
[default]
aws_access_key_id=<key>
aws_secret_access_key=<secret>
```
Create manually:
```bash
kubectl -n velero create secret generic velero-cloud-credentials \
--from-literal=cloud="$(printf '[default]\naws_access_key_id=%s\naws_secret_access_key=%s\n' "$KEY" "$SECRET")"
```
Or let **Ansible** create it from **`.env`** (`NOBLE_VELERO_AWS_ACCESS_KEY_ID`, `NOBLE_VELERO_AWS_SECRET_ACCESS_KEY`) or from extra-vars **`noble_velero_aws_access_key_id`** / **`noble_velero_aws_secret_access_key`**.
## Apply (Ansible)
1. Copy **`.env.sample`** → **`.env`** at the **repository root** and set at least:
- **`NOBLE_VELERO_S3_BUCKET`** — object bucket name
- **`NOBLE_VELERO_S3_URL`** — S3 API base URL (e.g. `https://minio.lan:9000` or your VersityGW/MinIO endpoint)
- **`NOBLE_VELERO_AWS_ACCESS_KEY_ID`** / **`NOBLE_VELERO_AWS_SECRET_ACCESS_KEY`** — credentials the AWS plugin uses (S3-compatible access key style)
2. Enable the role: set **`noble_velero_install: true`** in **`ansible/group_vars/all.yml`**, **or** pass **`-e noble_velero_install=true`** on the command line.
3. Run from **`ansible/`** (adjust **`KUBECONFIG`** to your cluster admin kubeconfig):
```bash
cd ansible
export KUBECONFIG=/absolute/path/to/home-server/talos/kubeconfig
# Velero only (after helm repos; skips other roles unless their tags match — use full playbook if unsure)
ansible-playbook playbooks/noble.yml --tags repos,velero -e noble_velero_install=true
```
If **`NOBLE_VELERO_S3_BUCKET`** / **`NOBLE_VELERO_S3_URL`** are not in **`.env`**, pass them explicitly:
```bash
ansible-playbook playbooks/noble.yml --tags repos,velero -e noble_velero_install=true \
-e noble_velero_s3_bucket=my-bucket \
-e noble_velero_s3_url=https://s3.example.com:9000
```
Full platform run (includes Velero when **`noble_velero_install`** is true in **`group_vars`**):
```bash
ansible-playbook playbooks/noble.yml
```
## Install (Ansible) — details
1. Set **`noble_velero_install: true`** in **`ansible/group_vars/all.yml`** (or pass **`-e noble_velero_install=true`**).
2. Set **`noble_velero_s3_bucket`** and **`noble_velero_s3_url`** via **`.env`** (**`NOBLE_VELERO_S3_*`**) or **`group_vars`** or **`-e`**. Extra-vars override **`.env`**. Optional: **`noble_velero_s3_region`**, **`noble_velero_s3_prefix`**, **`noble_velero_s3_force_path_style`** (defaults match `values.yaml`).
3. Run **`ansible/playbooks/noble.yml`** (Velero runs after **`noble_platform`**).
Example without **`.env`** (all on the CLI):
```bash
cd ansible
ansible-playbook playbooks/noble.yml --tags velero \
-e noble_velero_install=true \
-e noble_velero_s3_bucket=noble-velero \
-e noble_velero_s3_url=https://minio.lan:9000 \
-e noble_velero_aws_access_key_id="$KEY" \
-e noble_velero_aws_secret_access_key="$SECRET"
```
The **`clusters/noble/bootstrap/kustomization.yaml`** applies **`velero/namespace.yaml`** with the rest of the bootstrap namespaces (so **`velero`** exists before Helm).
## Install (Helm only)
From repo root:
```bash
kubectl apply -f clusters/noble/bootstrap/velero/namespace.yaml
# Create velero-cloud-credentials (see above), then:
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts && helm repo update
helm upgrade --install velero vmware-tanzu/velero -n velero --version 12.0.0 \
-f clusters/noble/bootstrap/velero/values.yaml \
--set-string configuration.backupStorageLocation[0].bucket=YOUR_BUCKET \
--set-string configuration.backupStorageLocation[0].config.s3Url=https://YOUR-S3-ENDPOINT \
--wait
```
Edit **`values.yaml`** defaults (bucket placeholder, `s3Url`) or override with **`--set-string`** as above.
## Quick checks
```bash
kubectl -n velero get pods,backupstoragelocation,volumesnapshotlocation
velero backup create test --wait
```
(`velero` CLI: install from [Velero releases](https://github.com/vmware-tanzu/velero/releases).)