Module 7: Storage & GitOps — Longhorn & Fleet¶

Module Overview

Longhorn is the CNCF-graduated distributed block storage system for Kubernetes, providing enterprise-grade persistent storage with synchronous replication, snapshots, backups, and encryption. Fleet is the built-in GitOps engine in Rancher Prime, enabling declarative workload management across hundreds of clusters. Together they form the data persistence and deployment automation backbone of the SUSE Cloud Native stack.

Part 1: Longhorn — Distributed Block Storage¶

What Is Longhorn?¶

Longhorn is a CNCF-graduated (formerly CNCF Incubating) distributed block storage system built natively for Kubernetes. Unlike traditional SAN/NAS solutions that require external hardware and proprietary protocols, Longhorn runs entirely as a set of containers on your Kubernetes cluster — turning the local disks of each cluster node into a unified, highly available storage pool.

Property	Traditional SAN/NAS	Longhorn
Architecture	External appliance, Fibre Channel/iSCSI	Software-defined, Kubernetes-native
Protocol	FC, iSCSI, NFS, SMB	Longhorn Engine (custom block protocol over TCP)
Replication	Appliance-based (RAID, SAN replication)	Per-volume synchronous replication (2x/3x)
Provisioning	LUNs, volumes via storage admin	CSI driver — `PersistentVolumeClaim` → dynamic volume
Cost	$10K–$100K+ per appliance	Zero additional hardware; runs on cluster nodes
Management	Dedicated admin console	`kubectl`, Longhorn UI, Rancher Prime dashboard

Key Insight

Longhorn is not an external storage appliance retrofitted for Kubernetes. It is built from the ground up for Kubernetes — every volume is a Kubernetes resource, every operation goes through the CSI driver, and the control plane runs as pods on the cluster itself.

Architecture¶

Longhorn Distributed Block Storage

Longhorn's architecture is microservice-based, with three core components working together:

Component	Role
Longhorn Manager	Kubernetes operator/controller; watches CRDs (`Volume`, `Engine`, `Replica`, `Backup`, `Snapshot`), schedules replicas, handles failover, exposes API and UI
Longhorn Engine	Dedicated controller per volume — a Linux process that fronts all I/O to a volume, handles synchronous replication writes across replicas, and rebuilds replicas after node recovery
Longhorn Replica	One Linux process per replica copy; stores volume data as a sparse file on a local disk or directory; each replica is an independent copy of the volume's data

Dedicated Storage Controllers

Each Longhorn volume gets its own dedicated Engine controller process. This is a critical architectural difference from shared-storage systems (like Ceph RBD) where a single OSD serves many volumes. The dedicated engine means: - I/O for one volume is isolated from others (no noisy-neighbour at the controller level) - A crash in one engine affects only that volume - Each engine can be tuned independently (block size, queue depth, replica count)

Synchronous Replication¶

Longhorn replicates every write synchronously across all replicas of a volume:

Application writes to the volume via the CSI driver
The Longhorn Engine receives the write I/O
The engine writes to all replicas in parallel
The engine waits for acknowledgement from a quorum (e.g., 2 out of 3 replicas)
Only then does the write complete to the application

This ensures no data loss on node failure — if a node goes down, the remaining replicas have the latest data.

Replication Factor	Quorum Required	Node Failures Tolerated	Usable Capacity
1x (no HA)	1/1	0	100%
2x	1/2	0 (split-brain risk)	50%
2x + RWO	2/2 (strong consistency)	0	50%
3x (recommended)	2/3	1	33%

A StorageClass configured with 3x synchronous replication ensures high availability and tolerates one node failure.

Key Features¶

Snapshots¶

Longhorn supports instantaneous, space-efficient snapshots using copy-on-write (CoW):

Zero-cost creation — snapshots are instantaneous regardless of volume size
Chain-based — snapshots form a chain; each snapshot stores only the delta from the previous
CRD-managed — Snapshot resource in Kubernetes
Restore — revert to any snapshot in the chain
Delete — snapshots can be removed; data merges into the parent (may be slow for large deltas)

Create a Longhorn snapshot named pre-upgrade-backup using kubectl to capture the current state of my-app-volume before a maintenance operation.

Backups (S3 / NFS)¶

Longhorn backs up snapshots to external targets — not just within the cluster:

Target Type	Example	Use Case
S3-compatible	AWS S3, MinIO, Ceph RADOS Gateway, DigitalOcean Spaces	Cloud backup, DR across regions
NFS	NFSv4 server, NAS appliance	On-premises backup, air-gapped environments
Azure Blob	Azure Blob Storage	Azure-native workloads
GCP	Google Cloud Storage	GCP-native workloads

A Backup CRD triggers an on-demand volume backup to an S3-compatible target, while a BackupTarget CRD configures the persistent backup destination and polling interval.

Disaster Recovery (DR) Volumes¶

DR volumes allow cross-cluster or cross-region recovery:

A DR volume is a read-only replica of a volume's backup in a different cluster
It continuously pulls the latest backup data from the backup target (S3/NFS)
On failover, the DR volume is activated (promoted to read-write) and used as the primary volume
No data rehydration delay — the data is already present locally from the continuous sync

Activate a DR volume on failover by annotating it to restore to read-write state, promoting the replicated backup to the primary volume.

RWX Support¶

Longhorn supports ReadWriteMany (RWX) volumes — multiple Pods can read and write to the same volume simultaneously:

Shared file system — uses a NFSv4 server (share-manager pod) per RWX volume
Underlying block storage — still uses Longhorn's replicated block storage; the NFS layer is a thin front-end
Performance — adequate for most shared workloads; not a replacement for parallel FS (Lustre, GPFS) for HPC

Access Mode	Longhorn Support	Typical Use Cases
RWO (ReadWriteOnce)	Native	Databases, message queues, any single-Pod consumer
RWX (ReadWriteMany)	NFS front-end	Shared file stores, media pipelines, web server content
ROX (ReadOnlyMany)	Via snapshot/restore	Data distribution, ML model serving

Encryption¶

Longhorn supports volume-level encryption using Linux dm-crypt / LUKS:

At-rest encryption — each replica is encrypted independently
Per-volume keys — each volume can have its own passphrase or key
Key management — keys stored in Kubernetes Secrets (or external KMS in Prime)
Performance impact — minimal (~5–10% overhead with AES-NI hardware)

An encrypted StorageClass enables LUKS-based at-rest encryption via dm-crypt, with per-volume keys managed through Kubernetes Secrets.

CSI Driver¶

Longhorn's CSI (Container Storage Interface) driver is fully compliant with the Kubernetes CSI specification:

CSI Feature	Longhorn Support
Dynamic provisioning
Block volume support
Mount option support
Volume expansion (online resize)
Volume cloning
Volume snapshots
Volume group snapshots
Seamless PVC migration

Monitoring¶

Longhorn provides built-in monitoring via:

Longhorn UI — Web dashboard with volume health, replica status, node disk usage, backup progress
Prometheus metrics endpoint — Exposes volume I/O, replica status, node disk, and engine metrics
Grafana dashboard — Pre-built dashboard (available in the Longhorn repo) for cluster-wide storage observability
Alerting — Critical alerts: volume degraded, replica failure, disk pressure, backup failure

A PrometheusRule alert fires when a Longhorn volume enters degraded state, with configurable severity and notification routing.

V2 Data Engine (SPDK)¶

The V2 Data Engine, introduced in Longhorn v1.5+, replaces the V1 Engine with a SPDK-based (Storage Performance Development Kit) data path:

Aspect	V1 Engine	V2 Engine (SPDK)
Data path	Linux block layer + iSCSI/tcp	SPDK — kernel-bypass, userspace NVMe/TCP
Latency	~100–200 µs	~10–20 µs (10x improvement)
Throughput	Moderate	2–3x higher IOPS
CPU usage	Low	Slightly higher (poll-mode drivers)
Replication	Engine-internal synchronous	NVMe-oF native replication
Maturity	GA (production proven)	Beta / Experimental (Longhorn v1.5+)

V2 Engine Maturity

The V2 Data Engine is currently experimental / beta in Longhorn v1.6–v1.7. It is not recommended for production use without thorough testing. V1 remains the production-grade default.

Use Cases¶

Stateful Workloads¶

Longhorn is designed for any stateful workload on Kubernetes:

Databases — PostgreSQL, MySQL, MongoDB, Redis, Cassandra
Message queues — Kafka, RabbitMQ, NATS
CI/CD runners — GitLab Runner, Jenkins agents with persistent workspace
Content management — WordPress, Drupal with persistent storage
Data analytics — Elasticsearch, Apache Spark, Presto

A PersistentVolumeClaim requests 100 Gi from the HA storage class, and a StatefulSet consumes it via volumeClaimTemplates for automatic per-Pod volume provisioning.

Harvester Backend¶

Longhorn is the default and recommended storage backend for Harvester, the SUSE virtualization platform:

Each Harvester node contributes local disks to the Longhorn storage pool
VM disk images are Longhorn volumes with 2x or 3x replication
Live migration of VMs relies on Longhorn's multi-node replica availability
VM snapshots are Longhorn snapshots
VM backups to S3/NFS use Longhorn's backup system

Harvester + Longhorn = Hyperconverged

In a Harvester cluster, you do not need separate shared storage. Each node's local disks are pooled by Longhorn, providing enterprise-grade HA storage with zero external dependency. This is the key differentiator from VMware vSAN — Longhorn is open-source, CNCF-graduated, and Kubernetes-native.

Edge Storage¶

Longhorn is well-suited for edge deployments due to:

Minimal hardware requirements — runs on 2–3 node clusters with local SSDs; no SAN fabric needed
Self-healing — automatic replica rebuilding when nodes recover after intermittent connectivity loss
Lightweight footprint — Longhorn Manager ~0.5 vCPU / 1 GB RAM; Engine per volume ~0.1 vCPU
Air-gapped capable — all images can be mirrored to a local Harbor registry
Offline operation — no external control plane dependency once deployed

Edge Best Practice

For 2-node edge clusters, use Longhorn with 2x replication and a third witness node (or configure maintenance mode carefully). For 3+ node clusters, 3x replication is the recommended production configuration.

Part 2: Fleet — GitOps at Scale¶

What Is Fleet?¶

Fleet is the built-in GitOps engine of Rancher Prime, designed to manage workloads declaratively across any number of Kubernetes clusters — from a single cluster to thousands. Fleet is CRD-based, purpose-built for multi-cluster operations, and deeply integrated with the Rancher ecosystem.

GitOps Engine	Fleet (Rancher)	ArgoCD	Flux
Origin	Rancher (SUSE)	Intuit (CNCF Graduated)	Weaveworks (CNCF Graduated)
Primary interface	CRDs (`Bundle`, `GitRepo`, `ClusterGroup`)	CRDs + CLI (`Application`, `ApplicationSet`)	CRDs (`GitRepository`, `Kustomization`, `HelmRelease`)
Multi-cluster native	First-class	Via ApplicationSet	Via Kustomization overlays
Multi-source	YAML, Helm, Kustomize	YAML, Helm, Kustomize	YAML, Helm, Kustomize
Helm deployment	Helm-native (uses Helm SDK natively)	Helm SDK (via sidecar)	Helm SDK (via source-controller)
Drift detection	Automatic	Automatic (3m default)	Automatic (poll or webhook)
Drift remediation	Automatic (desired state enforced)	Automatic	Automatic
UI integration	Rancher Prime UI (deeply embedded)	Web UI (ArgoCD Server)	Web UI (Flux Dashboard, optional)
Rancher integration	Native (built-in)	Manual via ArgoCD app	Manual via Flux app
Target scaling	Thousands of clusters	Hundreds of clusters	Hundreds of clusters
Air-gapped support	Native	Via repo mirroring	Via OCI artifacts

Why Fleet Matters in Rancher Prime

Fleet is not a third-party add-on. It is compiled into the Rancher Prime binary and available from the moment you start the Rancher UI. No extra install, no separate dashboard, no additional RBAC model. This deep integration is the primary differentiator from ArgoCD and Flux in a Rancher-centric environment.

Architecture¶

Fleet uses a hub-and-spoke model with CRDs as the primary interface:

Fleet GitOps Architecture

Fleet CRD	Purpose
`GitRepo`	Points to a Git repository branch/path and defines sync configuration
`Bundle`	A set of Kubernetes resources (from Helm, Kustomize, or raw YAML) ready for deployment
`ClusterGroup`	A selector-based group of target clusters (by label, name, or provider)
`BundleDeployment`	The per-cluster instantiation of a Bundle — Fleet creates one for each target cluster
`ImageScan`	Scans container images for vulnerabilities (integration with Harbor/NeuVector)

Multi-Source: YAML, Helm & Kustomize¶

Fleet can consume Kubernetes manifests from any combination of sources in a single Git repository:

Source Type	Fleet Handling
Raw YAML	Applied directly to the cluster as-is
Helm charts	Rendered using Helm SDK natively; supports `values.yaml`, `--set`, dependency management
Kustomize	Built using `kustomize build`; outputs rendered YAML which is then applied
Mixed	Fleet treats each directory as a separate Bundle; you can reference the same directory from multiple Bundle targets

A GitRepo CRD points Fleet to a production repository, sourcing Helm charts, Kustomize overlays, and raw YAML from different paths, targeting only production-labelled clusters.

Helm-Native Deployment¶

Unlike ArgoCD (which uses Helm as a sidecar) and Flux (which uses source-controller), Fleet renders Helm charts natively through the Helm SDK:

No sidecar — Helm execution happens in the Fleet controller pod itself
Dependency management — Fleet resolves Helm chart dependencies before deployment
Values override — Chart values can be overridden at the Bundle level, at the target level, or both
Release tracking — Fleet monitors Helm release status on each target cluster

A Bundle CRD deploys the NGINX Ingress Helm chart with per-environment value overrides — 3 replicas with LoadBalancer for production, 1 replica with ClusterIP for staging.

Drift Detection¶

Fleet continuously monitors the actual state of resources on each cluster against the desired state defined in Git:

Detection Mechanism	How It Works
Poll-based	Fleet re-polls the Git repository at a configurable interval (default: 15 seconds for Bundle status, 60 seconds for GitRepo)
Watch-based	Fleet watches Kubernetes resources it has deployed; if they are modified outside of Fleet, it detects the drift
Reconciliation	On detecting drift, Fleet applies the desired state again — restoring the Git-defined configuration automatically

Drift Detection in Production

Fleet's drift detection is not just theoretical — it actively reverts unauthorised changes. If an operator runs kubectl edit deployment on a Fleet-managed resource, Fleet will restore the original configuration within seconds. This is the fundamental guarantee of GitOps: Git is the single source of truth.

Multi-Cluster Scale¶

Fleet is designed for enterprise-scale multi-cluster management:

Thousands of targets — One Fleet controller can manage thousands of downstream clusters
Selective deployment — Use ClusterGroup and label selectors to target specific sets of clusters
Rollout strategies — Canary, rolling, or all-at-once deployments across clusters
Progressive delivery — Deploy to a subset of clusters first, validate, then roll out to all

A Bundle CRD implements a staged canary rollout across cluster groups, delaying full production deployment until canary groups report healthy status.

Fleet vs ArgoCD vs Flux — Detailed Comparison¶

GitOps Comparison

Capability	Fleet	ArgoCD	Flux
CNCF Status	Not yet graduated	Graduated	Graduated
Multi-cluster native	Built-in	ApplicationSet add-on	Cross-cluster via Flux CLI
Rancher integration	Native (built-in)	Manual app install	Manual app install
RBAC model	Inherits Rancher RBAC (projects, namespaces)	ArgoCD-specific RBAC (projects, roles)	Kubernetes RBAC only
UI	Rancher Prime UI (no separate login)	ArgoCD Server UI (separate)	Flux Dashboard (separate, optional)
Drift detection interval	~15 sec (Bundle polling)	3 min default	1–10 min (configurable)
Helm deployment	Helm SDK native (no sidecar)	Helm via sidecar	Helm via source-controller
Helm dependency	Native	Via sidecar	Manual (helm dependency build)
Kustomize
OCI artifacts	(charts in OCI)
Webhook integration	(GitHub, GitLab, Bitbucket)
Image update automation	Via ImageScan + Harbor	Image Updater	Image Automation
Sync waves / phases	Limited	Rich	Rich
Resource hooks (pre/post sync)	Not supported
Cluster registration	Auto-imported by Rancher	Manual (or via Rancher app)	Manual (or via Rancher app)
Air-gapped	Native (via Rancher)	Via mirror	Via OCI proxy
Learning curve	Low (if using Rancher)	Medium	Medium

Dimension	Best Choice
Best for Rancher Prime users	Fleet — no extra install, native RBAC, one UI
Best for standalone GitOps (non-Rancher)	ArgoCD or Flux — mature CNCF ecosystems
Best for Helm-heavy workflows	Flux — richest HelmRelease CRD
Best for multi-cluster at scale	Fleet — purpose-built for thousands of clusters
Best for CI/CD integration	ArgoCD — strongest GitOps tooling ecosystem

SUSE Private Registry (Harbor)¶

The SUSE Private Registry (Early Access) is a Harbor-powered container image registry that ships as part of the SUSE Cloud Native portfolio. It is designed for customers who need an enterprise-grade, air-gap-capable registry.

Key Capabilities¶

Feature	Description
OCI-compliant	Stores container images, Helm charts, OCI artifacts (cosign signatures, SBOMs)
Vulnerability scanning	Built-in Trivy scanner (also integrates with NeuVector)
RBAC	Project-level user and robot-account access control
Replication	Pull-through cache or push-based replication for air-gapped environments
Retention policies	Automatically clean up old/tagged images based on rules
Webhook notifications	Trigger downstream pipelines on image push
Immutable tags	Prevent overwriting production tags
Image signing	Integration with cosign for supply chain security
SBOM generation	SPDX-format Software Bill of Materials per image

Air-Gap Architecture¶

Air-Gapped Architecture

Air-Gap Strategy

The SUSE Private Registry is the cornerstone of air-gapped deployments. All SUSE Cloud Native components — Rancher Prime, Longhorn, NeuVector, Kubewarden — are available as OCI images that can be mirrored once into a local Harbor instance. After the initial sync, no internet access is required for cluster operations.

Positioning Scripts¶

Positioning Longhorn: "Your customer says: 'We use AWS EBS for stateful workloads on Kubernetes.' * Ask them: 'What happens when you run Kubernetes on-premises? Or at the edge? Or across multiple clouds?' * Longhorn is the only CNCF-graduated, Kubernetes-native distributed block storage that works the same way everywhere — on-prem, cloud, edge, or air-gapped. No SAN fabric. No proprietary appliance. No per-terabyte licensing. Just synchronous replication, enterprise snapshots and backups, RWX support, and a CSI driver that any PVC can consume. It runs on the same nodes as your workloads — hyperconverged by design — and it's built into Rancher Prime with one-click deployment."

Positioning Fleet: "Your customer asks: 'We already use ArgoCD — why would we switch to Fleet?' * Here's the answer: If you're already running Rancher Prime, Fleet is not a separate tool you need to install, configure, and maintain. It is compiled into Rancher Prime. It uses the same RBAC model as your Rancher projects. It appears in the same UI. It manages Helm, Kustomize, and raw YAML natively — and it's designed from the ground up for deploying to hundreds or thousands of clusters, not just one. * Fleet doesn't replace ArgoCD when ArgoCD is the right tool. But if you're managing clusters with Rancher, Fleet is the GitOps engine that's already there, already integrated, and already configured for multi-cluster operations at scale."

Positioning SUSE Private Registry: "Your customer says: 'We need to run Kubernetes in an air-gapped environment — no internet access at all.' * The SUSE Private Registry, powered by Harbor, is the answer. It is an OCI-compliant container registry with built-in vulnerability scanning, RBAC, image signing, and SBOM generation — fully capable of running in an air-gapped datacenter. All SUSE Cloud Native components ship as OCI images that you mirror once into the local registry, and from that point forward, your clusters never need to touch the internet."

Cross-References¶

Module	Topic	Link
Module 1: Strategy & Platform Overview	Where Longhorn and Fleet fit in the platform stack	The integrated storage + GitOps layer
Module 2: K8s Distributions	The clusters Longhorn and Fleet run on	RKE2 for production storage, K3s for edge GitOps
Module 3: Rancher Prime	The platform Fleet is built into	One-click deploy, native RBAC, Rancher UI integration
Module 4: NeuVector	Security for storage and GitOps pipelines	Image scanning in Harbor, runtime protection for stateful workloads
Module 5: Harvester	Longhorn as the virtualization storage backend	VM disk images as Longhorn volumes, snapshots, backups
Module 6: Edge Computing	Longhorn for edge storage, Fleet for edge GitOps	Offline operation, air-gapped image sync, small footprint
Module 8: Kubewarden	Deploying policies via Fleet GitOps	Policy-as-code delivered across clusters with Fleet
Module 9: Ecosystem & Competitive	Longhorn vs. vSAN, Fleet vs. Argo CD	Competitive storage and GitOps positioning
Module 10: Sales Scenarios by Vertical	Applying Longhorn and Fleet in a customer conversation	Storage for SAP, GitOps for financial services microservices
Quick Reference Card	Cheat sheet	Key numbers, product URLs, CLI commands
Module 11: MultiLinux Management	Managing heterogeneous Linux distributions from Rancher Prime	Unified system management across SLES, RHEL, Ubuntu with Rancher Prime