Enhance Longhorn application configuration by adding skipCrds option and retry settings to improve deployment resilience and error handling.
This commit is contained in:
@@ -55,6 +55,39 @@ talosctl -n 192.168.50.20 -e 192.168.50.230 health
|
||||
kubectl get nodes -o wide
|
||||
```
|
||||
|
||||
### `kubectl` errors: `lookup https: no such host` or `https://https/...`
|
||||
|
||||
That means the **active** kubeconfig has a broken `cluster.server` URL (often a
|
||||
**double** `https://` or **duplicate** `:6443`). Kubernetes then tries to resolve
|
||||
the hostname `https`, which fails.
|
||||
|
||||
Inspect what you are using:
|
||||
|
||||
```bash
|
||||
kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}{"\n"}'
|
||||
```
|
||||
|
||||
It must be a **single** valid URL, for example:
|
||||
|
||||
- `https://192.168.50.230:6443` (API VIP from `talconfig.yaml`), or
|
||||
- `https://kube.noble.lab.pcenicni.dev:6443` (if DNS points at that VIP)
|
||||
|
||||
Fix the cluster entry (replace `noble` with your context’s cluster name if
|
||||
different):
|
||||
|
||||
```bash
|
||||
kubectl config set-cluster noble --server=https://192.168.50.230:6443
|
||||
```
|
||||
|
||||
Or point `kubectl` at this repo’s kubeconfig (known-good server line):
|
||||
|
||||
```bash
|
||||
export KUBECONFIG="$(pwd)/kubeconfig"
|
||||
kubectl cluster-info
|
||||
```
|
||||
|
||||
Avoid pasting `https://` twice when running `kubectl config set-cluster ... --server=...`.
|
||||
|
||||
## 6) GitOps-pinned Cilium values
|
||||
|
||||
The Cilium settings that worked for this Talos cluster are now persisted in:
|
||||
@@ -134,6 +167,34 @@ Longhorn is deployed from:
|
||||
Monitoring apps are configured to use `storageClassName: longhorn`, so you can
|
||||
persist Prometheus/Alertmanager/Loki data once Longhorn is healthy.
|
||||
|
||||
### Argo CD: `longhorn` OutOfSync, Health **Missing**, no `longhorn-role`
|
||||
|
||||
**Missing** means nothing has been applied yet, or a sync never completed. The
|
||||
Helm chart creates `ClusterRole/longhorn-role` on a successful install.
|
||||
|
||||
1. See the failure reason:
|
||||
|
||||
```bash
|
||||
kubectl describe application longhorn -n argocd
|
||||
```
|
||||
|
||||
Check **Status → Conditions** and **Status → Operation State** for the error
|
||||
(for example Helm render error, CRD apply failure, or repo-server cannot reach
|
||||
`https://charts.longhorn.io`).
|
||||
|
||||
2. Trigger a sync (Argo CD UI **Sync**, or CLI):
|
||||
|
||||
```bash
|
||||
argocd app sync longhorn
|
||||
```
|
||||
|
||||
3. After a good sync, confirm:
|
||||
|
||||
```bash
|
||||
kubectl get clusterrole longhorn-role
|
||||
kubectl get pods -n longhorn-system
|
||||
```
|
||||
|
||||
### Extra drive layout (this cluster)
|
||||
|
||||
Each node uses:
|
||||
|
||||
Reference in New Issue
Block a user