Enhance Longhorn application configuration by adding skipCrds option and retry settings to improve deployment resilience and error handling.

This commit is contained in:
Nikholas Pcenicni
2026-03-27 17:47:54 -04:00
parent 76700a7b3f
commit 55833b2593
2 changed files with 68 additions and 1 deletions

View File

@@ -15,6 +15,7 @@ spec:
chart: longhorn
targetRevision: "1.11.1"
helm:
skipCrds: false
valuesObject:
defaultSettings:
createDefaultDiskLabeledNodes: false
@@ -23,7 +24,12 @@ spec:
automated:
prune: true
selfHeal: true
retry:
limit: 5
backoff:
duration: 20s
factor: 2
maxDuration: 3m
syncOptions:
- CreateNamespace=true
- PruneLast=true
- ServerSideApply=true

View File

@@ -55,6 +55,39 @@ talosctl -n 192.168.50.20 -e 192.168.50.230 health
kubectl get nodes -o wide
```
### `kubectl` errors: `lookup https: no such host` or `https://https/...`
That means the **active** kubeconfig has a broken `cluster.server` URL (often a
**double** `https://` or **duplicate** `:6443`). Kubernetes then tries to resolve
the hostname `https`, which fails.
Inspect what you are using:
```bash
kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}{"\n"}'
```
It must be a **single** valid URL, for example:
- `https://192.168.50.230:6443` (API VIP from `talconfig.yaml`), or
- `https://kube.noble.lab.pcenicni.dev:6443` (if DNS points at that VIP)
Fix the cluster entry (replace `noble` with your contexts cluster name if
different):
```bash
kubectl config set-cluster noble --server=https://192.168.50.230:6443
```
Or point `kubectl` at this repos kubeconfig (known-good server line):
```bash
export KUBECONFIG="$(pwd)/kubeconfig"
kubectl cluster-info
```
Avoid pasting `https://` twice when running `kubectl config set-cluster ... --server=...`.
## 6) GitOps-pinned Cilium values
The Cilium settings that worked for this Talos cluster are now persisted in:
@@ -134,6 +167,34 @@ Longhorn is deployed from:
Monitoring apps are configured to use `storageClassName: longhorn`, so you can
persist Prometheus/Alertmanager/Loki data once Longhorn is healthy.
### Argo CD: `longhorn` OutOfSync, Health **Missing**, no `longhorn-role`
**Missing** means nothing has been applied yet, or a sync never completed. The
Helm chart creates `ClusterRole/longhorn-role` on a successful install.
1. See the failure reason:
```bash
kubectl describe application longhorn -n argocd
```
Check **Status → Conditions** and **Status → Operation State** for the error
(for example Helm render error, CRD apply failure, or repo-server cannot reach
`https://charts.longhorn.io`).
2. Trigger a sync (Argo CD UI **Sync**, or CLI):
```bash
argocd app sync longhorn
```
3. After a good sync, confirm:
```bash
kubectl get clusterrole longhorn-role
kubectl get pods -n longhorn-system
```
### Extra drive layout (this cluster)
Each node uses: