diff --git a/clusters/noble/apps/longhorn/application.yaml b/clusters/noble/apps/longhorn/application.yaml index cdccda6..08ad80d 100644 --- a/clusters/noble/apps/longhorn/application.yaml +++ b/clusters/noble/apps/longhorn/application.yaml @@ -15,6 +15,7 @@ spec: chart: longhorn targetRevision: "1.11.1" helm: + skipCrds: false valuesObject: defaultSettings: createDefaultDiskLabeledNodes: false @@ -23,7 +24,12 @@ spec: automated: prune: true selfHeal: true + retry: + limit: 5 + backoff: + duration: 20s + factor: 2 + maxDuration: 3m syncOptions: - CreateNamespace=true - PruneLast=true - - ServerSideApply=true diff --git a/talos/README.md b/talos/README.md index b66d403..6817818 100644 --- a/talos/README.md +++ b/talos/README.md @@ -55,6 +55,39 @@ talosctl -n 192.168.50.20 -e 192.168.50.230 health kubectl get nodes -o wide ``` +### `kubectl` errors: `lookup https: no such host` or `https://https/...` + +That means the **active** kubeconfig has a broken `cluster.server` URL (often a +**double** `https://` or **duplicate** `:6443`). Kubernetes then tries to resolve +the hostname `https`, which fails. + +Inspect what you are using: + +```bash +kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}{"\n"}' +``` + +It must be a **single** valid URL, for example: + +- `https://192.168.50.230:6443` (API VIP from `talconfig.yaml`), or +- `https://kube.noble.lab.pcenicni.dev:6443` (if DNS points at that VIP) + +Fix the cluster entry (replace `noble` with your context’s cluster name if +different): + +```bash +kubectl config set-cluster noble --server=https://192.168.50.230:6443 +``` + +Or point `kubectl` at this repo’s kubeconfig (known-good server line): + +```bash +export KUBECONFIG="$(pwd)/kubeconfig" +kubectl cluster-info +``` + +Avoid pasting `https://` twice when running `kubectl config set-cluster ... --server=...`. + ## 6) GitOps-pinned Cilium values The Cilium settings that worked for this Talos cluster are now persisted in: @@ -134,6 +167,34 @@ Longhorn is deployed from: Monitoring apps are configured to use `storageClassName: longhorn`, so you can persist Prometheus/Alertmanager/Loki data once Longhorn is healthy. +### Argo CD: `longhorn` OutOfSync, Health **Missing**, no `longhorn-role` + +**Missing** means nothing has been applied yet, or a sync never completed. The +Helm chart creates `ClusterRole/longhorn-role` on a successful install. + +1. See the failure reason: + +```bash +kubectl describe application longhorn -n argocd +``` + +Check **Status → Conditions** and **Status → Operation State** for the error +(for example Helm render error, CRD apply failure, or repo-server cannot reach +`https://charts.longhorn.io`). + +2. Trigger a sync (Argo CD UI **Sync**, or CLI): + +```bash +argocd app sync longhorn +``` + +3. After a good sync, confirm: + +```bash +kubectl get clusterrole longhorn-role +kubectl get pods -n longhorn-system +``` + ### Extra drive layout (this cluster) Each node uses: