Harvester HCI End-to-End Setup -- KubeVirt, Longhorn, Rancher Integration, and DR

Harvester HCI KubeVirt Longhorn RKE2 Rancher SUSE Cloud-Native

Standalone Infrastructure | Component: Harvester HCI 1.3+ | Audience: Cloud-Native Infrastructure Engineers, Platform Architects

Harvester is SUSE's open source hyperconverged infrastructure platform, and it occupies a genuinely different position from every other hypervisor in this series. It's not a traditional hypervisor with Kubernetes bolted on. It's a Kubernetes cluster with KubeVirt bolted on. VMs in Harvester are Kubernetes pods running inside KubeVirt. The storage system is Longhorn, a Kubernetes native distributed block storage project. The networking layer uses multus CNI for multi-NIC VM support. If you're coming from a traditional virtualization background, the mental model shift is real: you're running VMs on Kubernetes, not Kubernetes clusters on VMs.

That distinction matters because it tells you who Harvester is for. It's for organizations running cloud-native workloads who need to also run VMs in the same infrastructure, managed through the same toolchain, by the same team. Rancher manages both the Harvester cluster and the Kubernetes workload clusters provisioned inside it. You get a single pane of glass across VM workloads and Kubernetes workloads without maintaining two separate platforms.


1. Hardware Requirements and Installation

Harvester runs on bare metal only. Nested virtualization and laptops are not officially supported. The installer checks hardware at boot and will stop with warnings if minimums aren't met, though you can bypass those checks during testing with the harvester.install.skipchecks=true kernel parameter during iPXE boot.

ComponentTesting MinimumProduction MinimumNotes
CPU8 cores16 coresx86_64 only. Hardware assisted virtualization required (Intel VT or AMD-V).
RAM32 GB64 GBHarvester reserves overhead for system services. Budget remaining RAM for VMs and Longhorn.
OS disk180 GB (multi-disk)250 GB single disk, 500 GB productionHarvester uses GPT partitioning for both UEFI and BIOS. MBR available for BIOS only systems.
Data diskSame disk as OS (testing)Separate dedicated data diskDuring installation you designate an OS disk and a data disk. The data disk is where Longhorn stores VM volumes. Keep them separate in production.
Network1 NIC2+ NICs per nodeSingle NIC works for testing. Production requires at least 2 NICs for the management bond. Additional NICs for custom cluster networks.

Use SUSE Linux Enterprise Server (SLES) 15 SP3 or SP4 certified hardware for best results. Harvester is built on SLE Micro and YES certified hardware has additional driver and system board compatibility validation that prevents the class of subtle hardware issues that are hard to diagnose after the fact.

Installation Process

  1. Download the Harvester ISO from the GitHub releases page (github.com/harvester/harvester/releases). Boot from ISO on the first node.
  2. Select "Create a new Harvester cluster" at the installer prompt.
  3. Choose the OS disk and data disk. If you only have one disk, Harvester partitions it for both. In production, always use separate disks.
  4. Configure the hostname, management NIC, IP address (static recommended), subnet, gateway, and DNS. Configure NTP servers. Harvester defaults to the SUSE NTP pool (0.suse.pool.ntp.org through 3.suse.pool.ntp.org) but you should set your own in production environments.
  5. Set the cluster VIP. This is the single IP address used to access the Harvester UI and for additional nodes to join the cluster. It must be a free IP on the management network, not assigned to any node directly.
  6. Set the cluster token. Any string works. Record it. You'll need it when adding nodes.
  7. Set the node password. The default SSH user is rancher.
  8. Wait approximately 10 minutes for Harvester to complete initialization. When the dashboard URL appears on the console screen, the first node is ready.
  9. For each additional node, boot from the same ISO, select "Join an existing Harvester cluster," and enter the cluster VIP and token.
The first node that joins is the management node. When the cluster reaches three nodes, the two nodes added after the first are automatically promoted to management nodes, forming a three-node HA management plane. This happens without any manual configuration. You don't designate management nodes. Harvester promotes them based on join order. All CPUs in the cluster must have the same specifications for live migration to work correctly. Mixed CPU generations are not supported for live migration.

2. Networking: Cluster Networks, VLANs, and Bonds

Harvester's networking model has three layers that you need to understand before configuring anything. Getting these wrong produces subtle connectivity issues that are hard to diagnose after VMs are running.

The Three Network Layers

  • Management network (mgmt): The built-in cluster network created during installation. Carries Kubernetes control plane traffic, Harvester UI traffic, Longhorn replication traffic by default, and VM traffic when no custom cluster network is configured. Every node must be reachable on this network. It's the backbone of the cluster.
  • Cluster network: A custom, traffic isolated forwarding path for VM workloads. You create cluster networks to separate VM traffic from management traffic. Each cluster network requires at least two NICs on every node to form a bond. The Longhorn storage network is configured as a separate cluster network when you want to isolate replication traffic from both management and VM traffic.
  • VM network (VLAN network): Logical networks that VMs connect to. Each VM network is associated with a cluster network and a VLAN tag. VMs attach virtual NICs to VM networks, not directly to physical NICs or cluster networks.

Creating a Custom Cluster Network

  1. In the Harvester UI, go to Networking, then Cluster Networks/Configs, and click Create.
  2. Name the cluster network (for example: vlan-network). This name is referenced when you create VM networks on top of it.
  3. Add a Network Config for each node: specify which physical NICs to bond for this cluster network on each node, the bond mode (active-backup or balance slb), and the uplink MTU. The same cluster network name applies to all nodes but each node can have different physical NIC assignments if hardware differs.
  4. After the cluster network is created, create VM networks under Networking, then VM Networks. Set the VLAN ID and select the cluster network as the uplink.

Longhorn Storage Network

By default, Longhorn replication traffic (the inter-node block replication that maintains data redundancy) shares the management network. For production deployments with heavy VM workloads, you want Longhorn replication on its own dedicated network to prevent replication traffic from saturating the management or VM network. Configure this as a dedicated cluster network with its own NIC bond, then configure the Longhorn storage network setting to use it.

Stop all VMs before changing the Longhorn storage network configuration. The setting change restarts Longhorn pods across the cluster, which briefly interrupts access to all volumes. If VMs are running when this happens, they'll lose access to their disks. The Harvester documentation is explicit about this requirement. Plan the change as a maintenance window.

3. Longhorn Storage

Longhorn is Harvester's built-in distributed block storage system. Every VM disk is a Longhorn volume replicated across cluster nodes. You don't configure Longhorn's underlying behavior directly. Harvester manages Longhorn on your behalf, and the official documentation explicitly warns against configuring Longhorn settings directly as this can lead to untested situations. Interact with storage through Harvester's StorageClass and volume settings instead.

StorageClasses and Replica Count

The default StorageClass is harvester-longhorn with a replica count of 3. On a three-node cluster, each VM disk gets three copies, one per node. If a node fails, the volume degrades but remains accessible from the remaining two replicas while Longhorn rebuilds the third on another node. This is the right default for a three-node production cluster.

On a single-node cluster, a replica count of 3 is impossible. Longhorn marks volumes as Degraded because it can't place three replicas on one node. Create a custom StorageClass with replica count 1 for single-node deployments and set it as the default. A replica count of 1 on a single node means no redundancy: if that node's disk fails, you lose the data.

On a two-node cluster, set replica count to 2. The example below creates a custom StorageClass for a two-node deployment and registers it as the cluster default. Adjust numberOfReplicas to match your actual node count.

kubectl: Create a custom StorageClass with replica count 2 for a two-node cluster
cat <
  

Filesystem Trim Warning

Disable filesystem trim (fstrim) inside guest VMs when those VMs use Longhorn volumes. The Harvester knowledge base documents a confirmed issue: filesystem trim operations on guest VMs cause I/O errors when Longhorn volumes are rebuilding, which causes VMs to alternate between running and paused states. The I/O errors don't cause data loss, but they do cause visible instability that causes unnecessary concern. Disable the fstrim.timer service in guest VMs, particularly in Kubernetes node VMs where the etcd service going unavailable cascades into cluster unavailability.


4. VM Management and Cloud-Init Integration

VMs in Harvester are Kubernetes custom resources of type VirtualMachine. You can create and manage them through the Harvester UI or directly with kubectl using KubeVirt manifests. The UI is the right path for most VM operations. kubectl gives you automation and GitOps compatible management.

Creating VMs from Images

  1. Upload a cloud image to Harvester: go to Images and click Create. Provide a URL to a cloud image (Ubuntu cloud, Debian generic cloud, RHEL KVM guest image) or upload from a local file. Harvester downloads the image and stores it as a Longhorn volume.
  2. Create a VM: go to Virtual Machines and click Create. Select the image as the boot volume source, set CPU and memory, attach a VM network, and configure cloud-init in the Cloud Config section.
  3. Cloud-init configuration is entered directly in the VM creation form as YAML. Harvester passes it to the VM at first boot via a virtual CD-ROM device that the cloud-init service inside the guest reads automatically.
Cloud-init user data example for Harvester VM configuration
#cloud-config
hostname: web-server-01
users:
  - name: admin
    sudo: ALL=(ALL) NOPASSWD:ALL
    ssh_authorized_keys:
      - ssh-rsa AAAAB3... your-public-key-here
    shell: /bin/bash
package_update: true
packages:
  - qemu-guest-agent
  - htop
runcmd:
  - systemctl enable --now qemu-guest-agent
  - echo "Setup complete" >> /var/log/cloud-init-complete.log

Install qemu-guest-agent in every VM via cloud-init. Without it, Harvester can't report accurate IP addresses for VMs in the UI, and live migration behavior is less reliable. It's a single package install and there's no reason to skip it.

VM Templates

Harvester supports VM templates for repeatable deployments. A template captures the VM's hardware configuration (CPU, memory, disk size, network) but not the disk contents. When you create a VM from a template, Harvester creates a fresh copy of the base image and applies the template's hardware specification. Templates reduce configuration mistakes on repeated VM deployments but they don't clone a fully configured VM. Use templates for hardware standardization, use VM cloning or snapshot based creation for software state preservation.


5. VM Migration and Live Migration

Harvester supports live migration of VMs between nodes with no downtime, as long as the CPUs across all nodes have identical specifications. This is a hard requirement documented by SUSE: mixed CPU generations in the same cluster prevent live migration. If your cluster has nodes with different CPU models, live migration is disabled for VMs.

Live migration uses the management network by default. For clusters with heavy migration traffic (cluster maintenance that evacuates multiple nodes simultaneously), isolate live migration traffic to a dedicated cluster network to prevent management network saturation.

kubectl: Trigger a live migration manually via KubeVirt API
# Harvester VMs are KubeVirt VirtualMachine resources
# List all VMs and their current node
kubectl get vmi -n default -o wide

# Trigger a live migration for a specific VM
cat <
  

6. Rancher Integration and Kubernetes Workloads

Harvester integrates natively with Rancher. Once you import your Harvester cluster into Rancher, you can provision Kubernetes clusters on top of Harvester VMs directly from the Rancher UI. Rancher's cluster provisioning creates VMs in Harvester, installs RKE2 on them, and manages the Kubernetes cluster lifecycle. You get a single Rancher instance managing both the infrastructure (Harvester) and the workloads (Kubernetes clusters running on it).

Importing Harvester into Rancher

  1. In Rancher, go to Virtualization Management and click Import Existing. Harvester generates a registration command (a kubectl apply of a Rancher agent manifest).
  2. Run the registration command from a machine with kubectl access to the Harvester cluster. The Harvester cluster registers with Rancher over TCP port 443.
  3. After registration, the Harvester cluster appears in Rancher's Virtualization Management page. From here you can manage VMs, images, and cluster networks directly within Rancher without switching to the Harvester UI.
  4. To provision a Kubernetes cluster on Harvester: in Rancher, go to Cluster Management, click Create, and select Harvester as the infrastructure provider. Rancher presents a form to define the VM specs for the Kubernetes node VMs, the number of nodes per role, and the Kubernetes version.
Rancher's embedded copy and Harvester's internal Rancher are the same binary but serve different purposes. Harvester ships with an embedded Rancher instance for managing the Harvester cluster itself. When you import Harvester into an external Rancher, you're connecting the Harvester cluster to a separate Rancher that manages multiple environments. Don't confuse the embedded Rancher (accessible on the Harvester UI) with an external Rancher manager. They coexist and serve different scopes.

7. Backup and DR in Harvester

Harvester's backup mechanism uses S3-compatible object storage as the backup target. Backups capture the full VM disk state and VM configuration. You configure one backup target per cluster (the S3 endpoint, bucket, access key, and secret key) in the Harvester settings. Once configured, you can take VM backups manually or on a schedule from the VM detail page.

VM Snapshots vs Backups

  • Snapshots: Stored locally in the Longhorn cluster. Fast to create and restore. Consume cluster storage. Not a DR mechanism because they're on the same cluster as the VM. Use snapshots for short-term rollback points before configuration changes.
  • Backups: Stored in the external S3 target. Slower to create and restore than snapshots. Survive cluster failure. The DR mechanism. Use backups for any restore point you need to survive a cluster wide failure.

VM Restore to a Different Cluster

Harvester supports restoring a VM backup to a different cluster that points to the same S3 backup target. This is how cross cluster DR works: take backups at the primary site to an S3 bucket, configure the DR Harvester cluster to use the same S3 bucket as its backup target, and restore from backup at the DR site when needed. The restored VM gets a new identity on the DR cluster but its disk contents match the backup point exactly.


Key Takeaways

  • Harvester runs VMs on Kubernetes using KubeVirt. VMs are Kubernetes custom resources. This is not a traditional hypervisor with Kubernetes added. The mental model is inverted from vSphere or Proxmox. If that model fits your team, Harvester gives you a unified platform for VMs and Kubernetes workloads.
  • Three-node HA is automatic. The first three nodes promote to management nodes by join order. All node CPUs must have identical specifications for live migration to work. Mixed CPU generations disable live migration for the entire cluster.
  • Production networking requires at minimum two NICs per node for the management bond. VM traffic should be on a dedicated cluster network with its own NIC bond, separate from the management network. Longhorn storage replication traffic should be on its own cluster network for best performance.
  • Stop all VMs before changing the Longhorn storage network configuration. The change restarts Longhorn pods, which interrupts disk access. Running VMs lose disk access during the restart window.
  • The default StorageClass harvester-longhorn uses 3 replicas. On a single-node cluster this marks all volumes as Degraded. Create a custom StorageClass with replica count 1 for single-node deployments.
  • Disable fstrim.timer inside guest VMs. Filesystem trim during Longhorn volume rebuild causes I/O errors that flip VMs between running and paused states. Not a data loss risk but a visible instability that cascades into guest Kubernetes cluster failures.
  • Backups to S3 are the DR mechanism. Snapshots are local to the cluster and don't survive a cluster wide failure. Restore from backup to a different cluster pointing at the same S3 bucket for cross cluster DR.

Read more