Remove deprecated Argo CD application configurations and related files for noble cluster, including root-application.yaml, kustomization.yaml, and individual application manifests for argocd, cilium, longhorn, kube-vip, and monitoring components. Update kube-vip daemonset.yaml to enhance deployment strategy and environment variables for improved configuration.

This commit is contained in:
Nikholas Pcenicni
2026-03-27 23:02:17 -04:00
parent 4263da65d8
commit d2c53fc553
37 changed files with 778 additions and 1042 deletions

View File

@@ -1,23 +0,0 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: argocd
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "-2"
spec:
project: default
destination:
server: https://kubernetes.default.svc
namespace: argocd
source:
repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
targetRevision: HEAD
path: clusters/noble/bootstrap/argocd
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View File

@@ -0,0 +1,34 @@
# Cilium — noble (Talos)
Talos uses **`cluster.network.cni.name: none`**; you must install Cilium (or another CNI) before nodes become **Ready** and before **MetalLB** / most workloads. See `talos/CLUSTER-BUILD.md` ordering.
## 1. Install (phase 1 — required)
Uses **`values.yaml`**: IPAM **kubernetes**, **`k8sServiceHost` / `k8sServicePort`** pointing at **KubePrism** (`127.0.0.1:7445`, Talos default), Talos cgroup paths, **drop `SYS_MODULE`** from agent caps, **`bpf.masquerade: false`** ([Talos Cilium](https://www.talos.dev/latest/kubernetes-guides/network/deploying-cilium/), [KubePrism](https://www.talos.dev/latest/kubernetes-guides/configuration/kubeprism/)). Without this, host-network CNI clients may **`dial tcp <VIP>:6443`** and fail if the VIP path is unhealthy.
From **repository root**:
```bash
helm repo add cilium https://helm.cilium.io/
helm repo update
helm upgrade --install cilium cilium/cilium \
--namespace kube-system \
--version 1.16.6 \
-f clusters/noble/apps/cilium/values.yaml \
--wait
```
Verify:
```bash
kubectl -n kube-system rollout status ds/cilium
kubectl get nodes
```
When nodes are **Ready**, continue with **MetalLB** (`clusters/noble/apps/metallb/README.md`) and other Phase B items. **kube-vip** for the Kubernetes API VIP is separate (L2 ARP); it can run after the API is reachable.
## 2. Optional: kube-proxy replacement (phase 2)
To replace **`kube-proxy`** with Cilium entirely, use **`values-kpr.yaml`** and **`cluster.proxy.disabled: true`** in Talos on every node (see comments inside `values-kpr.yaml`). Follow the upstream [Deploy Cilium CNI](https://www.talos.dev/latest/kubernetes-guides/network/deploying-cilium/) section *without kube-proxy*.
Do **not** skip phase 1 unless you already know your cluster matches the “bootstrap window” flow from the Talos docs.

View File

@@ -1,46 +0,0 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cilium
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "0"
spec:
project: default
# Argo SSA vs CLI helm: ignore generated TLS and fields Argo commonly owns so
# RespectIgnoreDifferences can skip fighting Helm on sync.
ignoreDifferences:
- group: ""
kind: Secret
name: hubble-server-certs
namespace: kube-system
jqPathExpressions:
- .data
- group: apps
kind: Deployment
name: cilium-operator
namespace: kube-system
jsonPointers:
- /spec/replicas
- /spec/strategy/rollingUpdate/maxUnavailable
destination:
server: https://kubernetes.default.svc
namespace: kube-system
sources:
- repoURL: https://helm.cilium.io/
chart: cilium
targetRevision: 1.16.6
helm:
valueFiles:
- $values/clusters/noble/apps/cilium/helm-values.yaml
- repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
targetRevision: HEAD
ref: values
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- RespectIgnoreDifferences=true

View File

@@ -1,36 +0,0 @@
# Same settings as the Argo CD Application (keep in sync).
# Used for manual `helm install` before Argo when Talos uses cni: none.
#
# operator.replicas: chart default is 2 with required pod anti-affinity. If fewer
# than two nodes can schedule (e.g. NotReady / taints), `helm --wait` never finishes.
k8sServiceHost: 192.168.50.20
k8sServicePort: 6443
cgroup:
autoMount:
enabled: false
hostRoot: /sys/fs/cgroup
ipam:
operator:
clusterPoolIPv4PodCIDRList:
- 10.244.0.0/16
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
operator:
replicas: 1

View File

@@ -0,0 +1,49 @@
# Optional phase 2: kube-proxy replacement via Cilium + KubePrism (Talos apid forwards :7445 → :6443).
# Prerequisites:
# 1. Phase 1 Cilium installed and healthy; nodes Ready.
# 2. Add to Talos machine config on ALL nodes:
# cluster:
# proxy:
# disabled: true
# (keep cluster.network.cni.name: none). Regenerate, apply-config, reboot as needed.
# 3. Remove legacy kube-proxy objects if still present:
# kubectl delete ds -n kube-system kube-proxy --ignore-not-found
# kubectl delete cm -n kube-system kube-proxy --ignore-not-found
# 4. helm upgrade cilium ... -f values-kpr.yaml
#
# Ref: https://www.talos.dev/latest/kubernetes-guides/network/deploying-cilium/
ipam:
mode: kubernetes
kubeProxyReplacement: "true"
k8sServiceHost: localhost
k8sServicePort: "7445"
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
cgroup:
autoMount:
enabled: false
hostRoot: /sys/fs/cgroup
bpf:
masquerade: false

View File

@@ -0,0 +1,44 @@
# Cilium on Talos — phase 1: bring up CNI while kube-proxy still runs.
# See README.md for install order (before MetalLB scheduling) and optional kube-proxy replacement.
#
# Chart: cilium/cilium — pin version in helm command (e.g. 1.16.6).
# Ref: https://www.talos.dev/latest/kubernetes-guides/network/deploying-cilium/
ipam:
mode: kubernetes
kubeProxyReplacement: "false"
# Host-network components cannot use kubernetes.default ClusterIP; Talos KubePrism (enabled by default)
# on 127.0.0.1:7445 proxies to healthy apiservers and avoids flaky dials to cluster.controlPlane.endpoint (VIP).
# Ref: https://www.talos.dev/latest/kubernetes-guides/configuration/kubeprism/
k8sServiceHost: "127.0.0.1"
k8sServicePort: "7445"
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
cgroup:
autoMount:
enabled: false
hostRoot: /sys/fs/cgroup
# Workaround: Talos host DNS forwarding + bpf masquerade can break CoreDNS; see Talos Cilium guide "Known issues".
bpf:
masquerade: false

View File

@@ -1,23 +0,0 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kube-vip
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "-1"
spec:
project: default
destination:
server: https://kubernetes.default.svc
namespace: kube-system
source:
repoURL: https://gitea.pcenicni.ca/gsdavidp/home-server.git
targetRevision: HEAD
path: clusters/noble/apps/kube-vip
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View File

@@ -3,4 +3,3 @@ kind: Kustomization
resources:
- vip-rbac.yaml
- vip-daemonset.yaml

View File

@@ -4,6 +4,11 @@ metadata:
name: kube-vip-ds
namespace: kube-system
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
selector:
matchLabels:
app.kubernetes.io/name: kube-vip-ds
@@ -13,6 +18,9 @@ spec:
app.kubernetes.io/name: kube-vip-ds
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
priorityClassName: system-node-critical
terminationGracePeriodSeconds: 90
serviceAccountName: kube-vip
nodeSelector:
node-role.kubernetes.io/control-plane: ""
@@ -32,6 +40,12 @@ spec:
args:
- manager
env:
# Leader election identity must be the Kubernetes node name (hostNetwork
# hostname is not always the same; without this, no leader → no VIP).
- name: vip_nodename
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: vip_arp
value: "true"
- name: address
@@ -41,29 +55,29 @@ spec:
# Physical uplink from `talosctl -n <cp-ip> get links` (this cluster: ens18).
- name: vip_interface
value: "ens18"
# Must include "/" — kube-vip does netlink.ParseAddr(address + subnet); "32" breaks (192.168.50.x32).
- name: vip_subnet
value: "32"
value: "/32"
- name: vip_leaderelection
value: "true"
- name: cp_enable
value: "true"
- name: cp_namespace
value: "kube-system"
# Control-plane VIP only until stable: with svc_enable=true the services leader-election
# path calls log.Fatal on many failures / leadership moves → CrashLoopBackOff on all CP nodes.
# Re-enable "true" after pods are 1/1; if they loop again, capture: kubectl logs -n kube-system -l app.kubernetes.io/name=kube-vip-ds --previous --tail=100
- name: svc_enable
value: "true"
# Env is svc_election (not servicesElection); see pkg/kubevip/config_envvar.go
- name: svc_election
value: "true"
value: "false"
- name: vip_leaseduration
value: "5"
value: "15"
- name: vip_renewdeadline
value: "3"
value: "10"
- name: vip_retryperiod
value: "1"
value: "2"
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
- SYS_TIME

View File

@@ -10,14 +10,20 @@ metadata:
name: kube-vip-role
rules:
- apiGroups: [""]
resources: ["services", "services/status", "nodes", "endpoints"]
resources: ["services", "services/status", "endpoints"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch", "update"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["discovery.k8s.io"]
resources: ["endpointslices"]
verbs: ["get", "list", "watch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
@@ -31,4 +37,3 @@ subjects:
- kind: ServiceAccount
name: kube-vip
namespace: kube-system

View File

@@ -1,10 +0,0 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- argocd/application.yaml
- cilium/application.yaml
- kube-vip/application.yaml
- longhorn/application.yaml
- monitoring-kube-prometheus/application.yaml
- monitoring-loki/application.yaml

View File

@@ -1,35 +0,0 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: longhorn
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "1"
spec:
project: default
destination:
server: https://kubernetes.default.svc
namespace: longhorn-system
sources:
- repoURL: https://charts.longhorn.io
chart: longhorn
targetRevision: "1.11.1"
helm:
skipCrds: false
valuesObject:
defaultSettings:
createDefaultDiskLabeledNodes: false
defaultDataPath: /var/mnt/longhorn
syncPolicy:
automated:
prune: true
selfHeal: true
retry:
limit: 5
backoff:
duration: 20s
factor: 2
maxDuration: 3m
syncOptions:
- CreateNamespace=true
- PruneLast=true

View File

@@ -0,0 +1,4 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- namespace.yaml

View File

@@ -0,0 +1,10 @@
# Longhorn Manager uses hostPath + privileged; incompatible with Pod Security "baseline".
# Apply before or after Helm — merges labels onto existing longhorn-system.
apiVersion: v1
kind: Namespace
metadata:
name: longhorn-system
labels:
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/warn: privileged

View File

@@ -0,0 +1,21 @@
# Longhorn Helm values — use with Talos user volume + kubelet mounts (see talos/talconfig.yaml).
# 1) PSA: `kubectl apply -k clusters/noble/apps/longhorn` (privileged namespace) before or after Helm.
# 2) Talos: bind `/var/lib/longhorn` → `/var/mnt/longhorn` in kubelet extraMounts — chart hostPath is fixed to /var/lib/longhorn.
# Example (run from home-server repo root so -f path resolves):
# kubectl apply -k clusters/noble/apps/longhorn
# helm repo add longhorn https://charts.longhorn.io && helm repo update
# helm upgrade --install longhorn longhorn/longhorn -n longhorn-system --create-namespace \
# -f clusters/noble/apps/longhorn/values.yaml
# "helm upgrade --install" needs two arguments: RELEASE_NAME and CHART (e.g. longhorn longhorn/longhorn).
#
# If you already installed Longhorn without this file: fix Default Settings in the UI or edit each
# node's disk path to /var/mnt/longhorn; wrong path → "wrong format" (root fs / overlay).
defaultSettings:
defaultDataPath: /var/mnt/longhorn
# Default 30% reserved often makes small data disks look "full" to the scheduler.
storageReservedPercentageForDefaultDisk: "10"
# Pre-upgrade Job waits for healthy managers; disable while fixing Talos image (iscsi-tools) / kubelet binds, then re-enable.
preUpgradeChecker:
jobEnabled: false

View File

@@ -0,0 +1,52 @@
# MetalLB (layer 2) — noble
**Prerequisite (Talos + `cni: none`):** install **Cilium** (or your CNI) **before** MetalLB.
Until the CNI is up, nodes stay **`NotReady`** and carry taints such as **`node.kubernetes.io/network-unavailable`** (and **`not-ready`**). The scheduler then reports **`0/N nodes are available: N node(s) had untolerated taint(s)`** and MetalLB stays **`Pending`** — its chart does not tolerate those taints, by design. **Install Cilium first** (`talos/CLUSTER-BUILD.md` Phase B); when nodes are **`Ready`**, reinstall or rollout MetalLB if needed.
**Order:** namespace (Pod Security) → **Helm** (CRDs + controller) → **kustomize** (pool + L2).
If `kubectl apply -k` fails with **`no matches for kind "IPAddressPool"`** / **`ensure CRDs are installed first`**, Helm is not installed yet.
**Pod Security warnings** (`would violate PodSecurity "restricted"`): MetalLBs speaker/FRR use `hostNetwork`, `NET_ADMIN`, etc. That is expected unless `metallb-system` is labeled **privileged**. Apply `namespace.yaml` **before** Helm so the namespace is created with the right labels (omit `--create-namespace` on Helm), or patch an existing namespace:
```bash
kubectl apply -f clusters/noble/apps/metallb/namespace.yaml
```
If you already ran Helm with `--create-namespace`, either `kubectl apply -f namespace.yaml` (merges labels) or:
```bash
kubectl label namespace metallb-system \
pod-security.kubernetes.io/enforce=privileged \
pod-security.kubernetes.io/audit=privileged \
pod-security.kubernetes.io/warn=privileged --overwrite
```
Then restart MetalLB pods if they were failing (`kubectl get pods -n metallb-system`; delete stuck pods or `kubectl rollout restart` each `Deployment` / `DaemonSet` in that namespace).
1. Install the MetalLB chart (CRDs + controller). If you applied `namespace.yaml` above, **skip** `--create-namespace`:
```bash
helm repo add metallb https://metallb.github.io/metallb
helm repo update
helm upgrade --install metallb metallb/metallb \
--namespace metallb-system \
--wait
```
2. Apply this folders pool and L2 advertisement:
```bash
kubectl apply -k clusters/noble/apps/metallb
```
3. Confirm a test `Service` `type: LoadBalancer` receives an address in `192.168.50.210``192.168.50.229`.
Reserve **one** IP in that range for Argo CD (e.g. `192.168.50.210`) via `spec.loadBalancerIP` or chart values when you expose the server.
### `Pending` MetalLB pods
1. `kubectl get nodes` — every node **`Ready`**? If **`NotReady`** or **`NetworkUnavailable`**, finish **CNI** install first.
2. `kubectl describe pod -n metallb-system <pod-name>` — read **Events** at the bottom (`0/N nodes are available: …`).
3. L2 speaker uses the nodes uplink; kube-vip in this repo expects **`ens18`** on control planes (`clusters/noble/apps/kube-vip/vip-daemonset.yaml`). If your NIC name differs, change `vip_interface` there.

View File

@@ -0,0 +1,19 @@
# Apply after MetalLB controller is installed (Helm chart or manifest).
# Namespace must match where MetalLB expects pools (commonly metallb-system).
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: noble-l2
namespace: metallb-system
spec:
addresses:
- 192.168.50.210-192.168.50.229
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: noble-l2
namespace: metallb-system
spec:
ipAddressPools:
- noble-l2

View File

@@ -0,0 +1,4 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ip-address-pool.yaml

View File

@@ -0,0 +1,11 @@
# Apply before Helm if you do not use --create-namespace, or use this to fix PSA after the fact:
# kubectl apply -f clusters/noble/apps/metallb/namespace.yaml
# MetalLB speaker needs hostNetwork + NET_ADMIN; incompatible with Pod Security "restricted".
apiVersion: v1
kind: Namespace
metadata:
name: metallb-system
labels:
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/warn: privileged

View File

@@ -0,0 +1,10 @@
# metrics-server — noble (Talos)
# Kubelet serving certs are not validated by default; see Talos docs:
# https://www.talos.dev/latest/kubernetes-guides/configuration/deploy-metrics-server/
#
# helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
# helm upgrade --install metrics-server metrics-server/metrics-server -n kube-system \
# --version 3.13.0 -f clusters/noble/apps/metrics-server/values.yaml --wait
args:
- --kubelet-insecure-tls

View File

@@ -1,64 +0,0 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: monitoring-kube-prometheus
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "2"
spec:
project: default
destination:
server: https://kubernetes.default.svc
namespace: monitoring
sources:
- repoURL: https://prometheus-community.github.io/helm-charts
chart: kube-prometheus-stack
targetRevision: "*"
helm:
valuesObject:
prometheus:
prometheusSpec:
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: longhorn
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
alertmanager:
alertmanagerSpec:
retention: 120h
storage:
volumeClaimTemplate:
spec:
storageClassName: longhorn
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
kubeEtcd:
enabled: false
kubeScheduler:
enabled: false
kubeControllerManager:
enabled: false
grafana:
defaultDashboardsTimezone: browser
additionalDataSources:
- name: Loki
type: loki
uid: loki
access: proxy
url: http://loki-stack.monitoring.svc.cluster.local:3100
isDefault: false
editable: true
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

View File

@@ -1,42 +0,0 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: monitoring-loki
namespace: argocd
annotations:
argocd.argoproj.io/sync-wave: "2"
spec:
project: default
destination:
server: https://kubernetes.default.svc
namespace: monitoring
sources:
- repoURL: https://grafana.github.io/helm-charts
chart: loki-stack
targetRevision: "*"
helm:
valuesObject:
loki:
enabled: true
persistence:
enabled: true
storageClassName: longhorn
size: 20Gi
promtail:
enabled: true
grafana:
enabled: false
prometheus:
enabled: false
filebeat:
enabled: false
fluent-bit:
enabled: false
logstash:
enabled: false
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true