Proxmox

Proxmox VE 8: Complete Three-Node Cluster Setup End to End

Eric Black

16 Mar 2026 — 18 min read

Proxmox VE 8 Cluster Ceph Corosync ZFS HA PBS Networking

Standalone Infrastructure | Component: Proxmox VE 8.x, PBS | Audience: Enterprise Architects, Senior Sysadmins

Proxmox VE is one of the more approachable hypervisors to get running, and one of the easier ones to get running badly. The installer takes 10 minutes. Building a three node production cluster with Ceph, proper network separation, and high availability configured correctly takes significantly longer and requires decisions that are hard to reverse after the fact. Cluster name, storage pool design, network topology, the subscription repository versus no-subscription: these choices lock in early and affect everything downstream.

This article is end to end on a three node Proxmox VE 8 cluster: hardware and pre-install decisions, installation and initial configuration, cluster formation with Corosync, network design with bridges, bonds, and VLANs, storage architecture with a decision guide between ZFS, Ceph, and external storage, Ceph setup from scratch, VM and LXC container creation with templates and cloud-init, HA manager configuration, and Proxmox Backup Server integration. The baseline assumption is three identical nodes capable of running Ceph.

1. Hardware Requirements and Pre-Install Decisions

Minimum Specifications

The official Proxmox VE minimum requirements from the Proxmox documentation are a 64-bit Intel or AMD CPU with hardware virtualization support (Intel VT or AMD-V), 2 GB RAM for the OS and PVE services plus additional RAM for guests, and at least 8 GB disk for the OS. Those are lab minimums. For production with Ceph, the real floor is higher.

Component	Lab Minimum	Production Minimum (with Ceph)	Notes
CPU	4 cores with VT/AMD-V	10+ cores per node	Reserve one core per Ceph service (MON, MGR, each OSD). Plan for 8+ cores dedicated to Ceph on a node running 6 OSDs.
RAM	8 GB	64 GB per node	Proxmox: 2 GB. Each OSD: 8 GB recommended per Ceph docs. ZFS ARC: ~1 GB per TB of ZFS storage. Guests: size accordingly.
OS disk	32 GB SSD	Two SSDs in ZFS mirror	Never put the OS on a single disk in production. ZFS mirror adds no CPU overhead and saves you from an OS disk failure taking down a node.
Ceph OSD disks	One per node	Three to six per node, same model and size	One OSD per physical disk. No RAID controller between Ceph and the disks. Use SSDs with Power Loss Protection for OSD disks.
Network	1 GbE single NIC	Two 10 GbE NICs per node minimum	Ceph cluster network should be dedicated. Corosync must be on a separate, low latency network. 25 GbE is preferred for NVMe based Ceph.

The RAID Controller Warning

Both ZFS and Ceph require direct access to raw disks. Neither works correctly behind a hardware RAID controller. RAID controllers cache writes, hide disk errors, and manage disks in ways that interfere with ZFS's checksumming and Ceph's OSD management. If your server has a hardware RAID controller, either flash it to HBA mode (IT mode on LSI controllers) or use a pass through HBA instead. This is documented explicitly in both the Proxmox and Ceph official documentation.

Subscription Repository vs No-Subscription

Proxmox VE is free and open source. The enterprise repository requires a paid subscription. The no-subscription repository is free and available to everyone. For production environments, an enterprise subscription is worth the cost: you get the tested stable package feed, a higher quality update cadence, and access to Proxmox support. For labs and test environments, the no-subscription repository is fine. You'll want to configure one or the other immediately after installation, because the default installer points at the enterprise repository and will generate errors on every apt update if you don't have a subscription key.

bash: Configure no-subscription repository (run on each node post-install)

# Disable the enterprise repo (remove subscription nag)
echo "# disabled - no subscription" > /etc/apt/sources.list.d/pve-enterprise.list

# Disable Ceph enterprise repo if you'll use Ceph without subscription
echo "# disabled - no subscription" > /etc/apt/sources.list.d/ceph.list

# Add the no-subscription PVE repository
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" \
  > /etc/apt/sources.list.d/pve-no-sub.list

# Update and upgrade
apt update && apt dist-upgrade -y

Never mix packages from the enterprise and no-subscription repositories on the same node. Pick one and stick to it. Mixed sources cause package dependency conflicts that are difficult to untangle. If you're adding a subscription later to a node that's been running no-subscription packages, update all packages to the current enterprise versions in one shot before mixing sources.

2. Installation and Initial Host Configuration

Install Proxmox VE from the ISO on each node independently before forming the cluster. Don't form the cluster first and then configure the OS. The installer handles disk formatting (including ZFS mirror for the OS if you select it), sets the hostname, configures the management IP, and sets the root password.

A few installer decisions that matter:

Hostname: Set a proper FQDN during installation (pve1.yourdomain.local, not just pve1). Changing the hostname after cluster formation is painful. Pick names you'll live with.
OS disk filesystem: Choose ZFS (RAID1) during installation if you have two OS disks. The installer will mirror them automatically. Choose ext4 or XFS if you only have one OS disk and plan to use the other for Ceph OSDs.
Management IP: Set a static IP during installation. Using DHCP for the management interface causes cluster communication failures when a lease renews to a different address.

Post-Install Configuration on Each Node

bash: Post-install baseline configuration (run on each node)

# Update /etc/hosts on every node with all cluster member entries
# This is required for Corosync and cluster communication
# Do this before forming the cluster
cat >> /etc/hosts << 'EOF'
10.0.100.11  pve1.yourdomain.local  pve1
10.0.100.12  pve2.yourdomain.local  pve2
10.0.100.13  pve3.yourdomain.local  pve3
EOF

# Verify hostnames resolve correctly
ping -c 2 pve2
ping -c 2 pve3

# Configure NTP (chrony is default on PVE 8 / Debian 12 Bookworm)
# Corosync is sensitive to time drift - nodes must be synchronized
systemctl status chrony
chronyc tracking

# If chrony isn't running or not synced:
apt install chrony -y
systemctl enable --now chrony

# Disable the enterprise subscription nag in the web UI
# (optional, only affects the browser UI warning)
sed -i.bak "s/if (res === null || res === undefined || !res || res/if (false || res/g" \
  /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
systemctl restart pveproxy

3. Cluster Networking: Bridges, Bonds, VLANs, and OVS

Proxmox VE uses Linux native networking by default. Every interface configuration is stored in /etc/network/interfaces and applied with ifreload -a (from the ifupdown2 package, which PVE installs by default). You don't need a reboot to apply network changes, but you do need to get the configuration right before forming the cluster, because changing the Corosync network after cluster formation requires touching the corosync.conf on a running cluster.

Traffic Types and Network Design

Traffic Type	Interface	Notes
Management / Web UI	vmbr0 (bridged to bond0 or single NIC)	This is where the PVE web interface listens on port 8006. Keep on management VLAN.
Corosync cluster	Dedicated NIC or VLAN, NOT bridged	Corosync is latency sensitive. Dedicated physical NIC is strongly preferred. Must be under 5ms latency between nodes.
Ceph public network	Dedicated NIC or bond	VM-to-Ceph client traffic. 10 GbE minimum. Separate from Ceph cluster network.
Ceph cluster network	Dedicated NIC or bond	OSD-to-OSD replication and heartbeat traffic. Keep isolated from all other traffic. 10 GbE minimum, 25 GbE for NVMe OSDs.
VM / guest	vmbr1 (trunk or access ports)	VM traffic. Bridge with VLAN awareness enabled for multi-tenant environments.
Live Migration	Shares management or Ceph public depending on config	PVE uses the management network for migrations by default. Can be redirected to a dedicated interface via Datacenter options.

/etc/network/interfaces: converged production layout for pve1 (adjust IPs per node)

auto lo
iface lo inet loopback

# Physical NICs - do not assign IPs directly
auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto eno3
iface eno3 inet manual

auto eno4
iface eno4 inet manual

# Bond for management + VM traffic (LACP)
auto bond0
iface bond0 inet manual
    bond-slaves eno1 eno2
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer2+3
    bond-lacp-rate fast

# Management bridge (vmbr0) - hosts PVE web UI and management IP
auto vmbr0
iface vmbr0 inet static
    address 10.0.100.11/24
    gateway 10.0.100.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    dns-nameservers 10.0.100.1

# VM traffic bridge (trunk, VLAN-aware)
auto vmbr1
iface vmbr1 inet manual
    bridge-ports bond0.400
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes

# Corosync cluster network - dedicated NIC, no bridge
auto eno3
iface eno3 inet static
    address 10.0.200.11/24

# Ceph public network
auto eno4
iface eno4 inet static
    address 10.0.300.11/24

# Ceph cluster network (if you have a 5th NIC; otherwise share with Ceph public)
# auto eno5
# iface eno5 inet static
#     address 10.0.400.11/24

Linux Bridge vs OVS

Proxmox supports both Linux native bridges and Open vSwitch (OVS). Linux bridges are the default and the right choice for most deployments. They're simpler, more stable, and don't require additional packages. OVS adds support for OpenFlow, software defined networking features, and more complex VLAN trunk configurations. Use OVS only if you have a specific need for it, such as integration with an SDN controller or VXLAN tunneling between sites. For a standard three node cluster, Linux bridges with VLAN awareness enabled on the bridge cover every production networking scenario.

Applying Network Changes Without a Reboot

bash: Validate and apply /etc/network/interfaces changes live

# Validate the config before applying (will report syntax errors)
ifup --no-act -a

# Apply changes without reboot (ifupdown2 - installed by default on PVE 8)
ifreload -a

# Verify interfaces are up and have correct addresses
ip addr show
ip route show

4. Cluster Formation with Corosync

The cluster must be created from one node and all other nodes join it. You can't merge two existing clusters. If a node already has VMs on it before joining, those VMs become part of the cluster's shared namespace. Guest IDs (VM IDs) must be unique across the cluster; conflicts during join will cause the join to fail.

bash: Create the cluster on pve1, then join pve2 and pve3

# On pve1 only: create the cluster
# --link0 specifies the Corosync network IP on this node
pvecm create prod-cluster --link0 10.0.200.11

# Verify cluster created successfully
pvecm status

# On pve2: join the cluster
# IP is pve1's Corosync IP (link0 address), not the management IP
# --link0 is this node's own Corosync IP
pvecm add 10.0.200.11 --link0 10.0.200.12
# You'll be prompted for pve1's root password

# On pve3: join the cluster
pvecm add 10.0.200.11 --link0 10.0.200.13

# Verify all nodes are present and quorate
pvecm status
pvecm nodes

Corosync requires network latency under 5ms between all nodes to operate stably. The official Proxmox documentation states this explicitly. Higher latency may work with two or three nodes, but the cluster becomes increasingly unreliable above 10ms. Corosync doesn't use much bandwidth, but it's extremely sensitive to latency jitter. Never put Corosync traffic on a network that carries large data transfers, even on a separate VLAN on the same physical switch. Give it a physically separate NIC on a dedicated switch if HA is important.

Corosync Redundant Links

Corosync supports two redundant network links (link0 and link1) for cluster communication. When both links are configured, Corosync uses knet transport which handles failover between links automatically. Configure both links during cluster creation, not after. Adding a second link to an existing cluster requires editing corosync.conf on a live cluster, which is possible but carries risk.

bash: Create cluster with redundant Corosync links

# Create cluster with two Corosync network links
pvecm create prod-cluster --link0 10.0.200.11 --link1 10.0.201.11

# Join with two links
pvecm add 10.0.200.11 --link0 10.0.200.12 --link1 10.0.201.12

# Verify both links are active
pvecm status | grep -A 20 "Membership"

5. Storage Architecture Decision Guide

Storage Type	Best For	Requirements	Key Limitations
Local ZFS	OS boot disks, fast local scratch, single node VM storage that doesn't need live migration	Direct attached SSDs, no hardware RAID. 1 GB RAM per TB of ZFS storage.	Not shared: VMs on local ZFS can't live migrate. Offline migration only.
Ceph RBD	Shared VM disks with live migration, hyperconverged HA storage	3+ nodes, dedicated OSDs, 10 GbE Ceph network, 8 GB RAM per OSD	Minimum 3 nodes for replication. Performance degrades during node failure and rebalancing.
NFS	Shared storage from existing NAS, ISO storage, template storage	NFS server on dedicated storage, 1+ GbE network	Single point of failure unless NFS server is itself HA. Latency sensitive for database workloads.
iSCSI	Connecting to existing SAN, shared storage without Ceph	iSCSI initiator (open-iscsi), dedicated storage network recommended	Requires separate SAN management. LVM clustering required for shared disk access.
ZFS over iSCSI	Sharing ZFS volumes from a dedicated storage node	Storage node with ZFS pool, target portal configured	Additional complexity. The storage node becomes a single point of failure unless replicated.

For a three node cluster where you want live migration and HA, Ceph is the right choice. It's the only storage type in the table that's truly shared, distributed, and survives a node failure without manual intervention. NFS and iSCSI work but introduce a storage server as a dependency that requires its own HA strategy.

6. Ceph Setup End to End

Ceph must be installed on every node that will participate in the cluster. On a three node cluster, you'll run a Ceph Monitor (MON), Ceph Manager (MGR), and Ceph OSD daemons on each node. One CPU core per Ceph service is the minimum per the official documentation. Plan for at least 8 dedicated CPU cores per node if you're running a MON, a MGR, and 6 OSDs.

The 3-Node Ceph Replication Reality

With a three node cluster and default size=3, min_size=2 pool settings, one complete node failure leaves Ceph degraded but still serving I/O. VMs continue running. However, Ceph can't fully recover to a healthy state until the failed node is restored, because there's no fourth node to receive the missing replica. This is the honest trade off of three node Ceph: it handles failure gracefully but doesn't self-heal. For environments where self-healing redundancy is required, five nodes is the recommended production minimum.

bash: Install and configure Ceph on all three nodes

# Run on each node
# Choose the Ceph version matching the current Proxmox release
# PVE 8.x ships with Ceph Reef (17) or Quincy (17) depending on minor version
# Check the Proxmox release notes for the correct version for your PVE build

# Install Ceph packages (run on each node)
pveceph install --repository no-subscription

# Initialize Ceph on the first node (pve1)
# --network sets the Ceph public network
# --cluster-network sets the OSD replication network (optional but strongly recommended)
pveceph init \
  --network 10.0.300.0/24 \
  --cluster-network 10.0.400.0/24

# Create Monitors on each node (run on pve1 for all three)
pveceph mon create pve1
pveceph mon create pve2
pveceph mon create pve3

# Create Managers on each node
pveceph mgr create pve1
pveceph mgr create pve2
pveceph mgr create pve3

# Verify MON and MGR are running
ceph -s

Creating OSDs

bash: Create OSDs on each node (run per node, adjust disk names)

# List available disks (should show disks not currently in use)
pveceph osd list

# If a disk was previously used for ZFS or another OSD, wipe it first
# This destroys all data on the disk - confirm the disk name before running
wipefs -a /dev/sdb
sgdisk -Z /dev/sdb

# Create an OSD on each dedicated disk
# Bluestore is the default (and correct) OSD type since Ceph Luminous
pveceph osd create /dev/sdb
pveceph osd create /dev/sdc
pveceph osd create /dev/sdd

# Optionally specify a separate NVMe for the Bluestore DB/WAL
# This significantly improves performance on spinning disk OSDs
# WAL lives with DB by default when only -db_dev is specified
# Do NOT point -db_dev and -wal_dev at the same device - it causes an error
pveceph osd create /dev/sdb -db_dev /dev/nvme0n1

# After creating OSDs on all nodes, verify cluster health
ceph -s
ceph osd tree

Creating a Ceph Pool and Adding Storage to Proxmox

bash: Create a replicated pool and add it as PVE storage

# Create a pool for VM disks
# size=3: three replicas (one per node)
# min_size=2: minimum replicas required to serve I/O
# pg_autoscale_mode=on: let Ceph manage placement group count automatically
pveceph pool create vm-pool \
  --size 3 \
  --min_size 2 \
  --pg_autoscale_mode on \
  --add_storages 1        # automatically adds pool to PVE storage config

# Verify pool created and storage is available in PVE
ceph osd pool ls
pvesm status

7. VM and LXC Container Creation with Templates and Cloud-Init

Creating a VM Template with Cloud-Init

Building VMs from a cloud-init template is faster and more consistent than installing from ISO every time. The workflow is: download a cloud image, import it as a VM, attach a cloud-init drive, convert to a template, then clone from the template whenever you need a new VM.

bash: Create a Debian 12 cloud-init template (run on any node)

# Download a cloud image (Debian 12 Bookworm generic cloud image)
wget https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-genericcloud-amd64.qcow2

# Create a VM shell with ID 9000 (use a high ID for templates to avoid conflicts)
qm create 9000 \
  --name "debian-12-template" \
  --memory 2048 \
  --cores 2 \
  --net0 virtio,bridge=vmbr0 \
  --serial0 socket \
  --vga serial0 \
  --ostype l26

# Import the cloud image as the VM's primary disk into your Ceph pool
qm importdisk 9000 debian-12-genericcloud-amd64.qcow2 vm-pool

# Attach the imported disk as scsi0 (VirtIO SCSI is recommended)
qm set 9000 --scsihw virtio-scsi-pci --scsi0 vm-pool:vm-9000-disk-0

# Add a cloud-init drive
qm set 9000 --ide2 vm-pool:cloudinit

# Set boot order to the primary disk
qm set 9000 --boot c --bootdisk scsi0

# Configure cloud-init defaults (these apply to all clones)
qm set 9000 \
  --ciuser admin \
  --sshkeys ~/.ssh/id_rsa.pub \
  --ipconfig0 ip=dhcp

# Convert to template (this is irreversible)
qm template 9000

bash: Clone the template to create a new VM with cloud-init customization

# Clone the template (full clone stores independent copy in Ceph pool)
qm clone 9000 101 \
  --name "web-server-01" \
  --full \
  --storage vm-pool

# Customize this instance's cloud-init settings
qm set 101 \
  --ipconfig0 ip=10.0.100.101/24,gw=10.0.100.1 \
  --nameserver 10.0.100.1 \
  --searchdomain yourdomain.local \
  --memory 4096 \
  --cores 4

# Resize the disk if needed
qm resize 101 scsi0 +20G

# Start the VM
qm start 101

LXC Containers

LXC containers in Proxmox share the host kernel. They start in seconds rather than minutes, use far less RAM than a full VM, and are the right choice for services that don't need kernel level isolation: web servers, DNS, monitoring agents, databases that run on Linux. The trade off is that LXC containers run the host's kernel version and can't run a different kernel or kernel modules not loaded on the host.

bash: Download an LXC template and create a container

# Download an LXC template from Proxmox's template repository
# List available templates
pveam available | grep debian

# Download the template to local storage
pveam download local debian-12-standard_12.7-1_amd64.tar.zst

# Create a container
pct create 200 local:vztmpl/debian-12-standard_12.7-1_amd64.tar.zst \
  --hostname dns-server \
  --storage vm-pool \
  --rootfs vm-pool:8 \
  --memory 512 \
  --swap 512 \
  --cores 1 \
  --net0 name=eth0,bridge=vmbr0,ip=10.0.100.200/24,gw=10.0.100.1 \
  --nameserver 10.0.100.1 \
  --password \
  --unprivileged 1   # strongly recommended for security

# Start the container
pct start 200

# Access container console
pct console 200

8. High Availability Manager Configuration

Proxmox HA requires at least three nodes with quorum. With two nodes, HA can't safely determine which node failed versus which node lost network connectivity, so it refuses to act. Three nodes provide the majority vote needed to make that determination safely.

HA in Proxmox works through two mechanisms: the HA Manager daemon (pve-ha-lrm on each node, pve-ha-crm cluster wide) and fencing. Fencing is how HA ensures a potentially failed node actually stops running VMs before restarting them elsewhere. Without reliable fencing, you risk two instances of the same VM running simultaneously on different nodes with the same IP and disk, which corrupts data. Don't enable HA without confirming your fencing mechanism works.

Fencing Options

IPMI/iDRAC/iLO watchdog: The correct production approach. Configure the hardware management interface on each node. If a node loses quorum and doesn't fence itself within the watchdog timeout, the surviving nodes trigger a remote power off via IPMI. Reliable, hardware enforced.
Self-fencing via hardware watchdog: Proxmox's HA manager activates a hardware watchdog on each node. If the HA daemon stops feeding the watchdog (because the node lost quorum), the watchdog triggers a hardware reset. Works without IPMI but requires a hardware watchdog device, which most server class hardware provides.
No fencing (not recommended): Proxmox will warn you and limit HA functionality. In a three node cluster where one node fails uncleanly, HA won't restart VMs without confirmed fencing. The safest behavior, but VMs stay down until you manually confirm the failed node is off.

bash: Create HA groups and add resources (groupadd deprecated in PVE 8, see note below)

# Create an HA group defining which nodes can run HA resources
# Nodes are listed with optional priority (higher = preferred)
ha-manager groupadd production \
  --nodes "pve1:3,pve2:2,pve3:1" \
  --restricted 0 \
  --nofailback 0

# Add a VM to HA management
ha-manager add vm:101 \
  --group production \
  --state started \
  --max_restart 3 \
  --max_relocate 3

# Add an LXC container to HA management
ha-manager add ct:200 \
  --group production \
  --state started

# Check HA status
ha-manager status
ha-manager status --full

ha-manager groupadd is deprecated in Proxmox VE 8 in favor of HA rules. The commands above still work and are useful for understanding the underlying model, but new deployments should use the modern approach: ha-manager rules add. For node affinity (equivalent to a group with preferred nodes), use: ha-manager rules add node-affinity prod-rule --resources vm:101,ct:200 --nodes "pve1:3,pve2:2,pve3:1". The groupadd approach remains functional for backward compatibility but won't receive new features.

The max_restart and max_relocate parameters control how aggressively HA recovers a resource. max_restart is how many times HA tries to restart the VM on the same node before giving up and migrating to another node. max_relocate is how many times it tries different nodes. Set both to 3 as a starting point. A VM that fails three restarts on three different nodes has a problem that HA migration won't fix; at that point you want it to stop trying so you can investigate.

9. Proxmox Backup Server Integration

Proxmox Backup Server (PBS) is a separate product, also free and open source, designed specifically for backing up Proxmox VMs, containers, and host configurations. It uses incremental backups based on data changed since the last backup, client-side encryption, and a deduplication store that dramatically reduces storage consumption compared to full VM backups. It integrates natively with Proxmox VE with no additional configuration beyond adding PBS as a storage target.

PBS can run on a dedicated physical server, a VM on a separate Proxmox node, or a VM on the same cluster you're backing up (not ideal but workable for smaller environments). Don't run PBS as a VM on the same Ceph pool you're backing up: a Ceph failure takes down both the VMs and the backups simultaneously.

Adding PBS to Proxmox VE

bash: Add PBS as a storage target in Proxmox VE

# On the PBS server: get the PBS server fingerprint for authentication
proxmox-backup-manager cert info | grep Fingerprint

# In PVE (via CLI or web UI):
# Add PBS as a storage target
pvesm add pbs pbs-store \
  --server pbs.yourdomain.local \
  --datastore vm-backups \
  --username backup@pbs \
  --password "your-pbs-password" \
  --fingerprint "AB:CD:...:EF"  # paste fingerprint from PBS cert info

# Verify the storage is accessible
pvesm status pbs-store

Creating Backup Jobs

bash: Create scheduled backup jobs via CLI

# Create a backup job for all VMs in the cluster
# Schedule: daily at 02:00, keep 7 daily, 4 weekly, 3 monthly backups
pvesh create /cluster/backup \
  --id daily-backup \
  --storage pbs-store \
  --schedule "02:00" \
  --all 1 \
  --mode snapshot \
  --compress zstd \
  --notes-template "{{guestname}} backup {{date}}" \
  --prune-backups "keep-daily=7,keep-weekly=4,keep-monthly=3"

# List configured backup jobs
pvesh get /cluster/backup

# Run a backup job immediately (useful for testing)
vzdump --storage pbs-store --all 1 --mode snapshot --compress zstd

PBS Garbage Collection and Verification

PBS uses a chunk based deduplication store. Deleting a backup doesn't free space immediately because chunks may be shared across multiple backups. Run garbage collection on the PBS datastore periodically to reclaim space from deleted backups. By default, PBS schedules GC automatically, but in environments with aggressive retention policies you may want to run it more frequently. PBS also supports scheduled verification jobs that read back every chunk of every backup and confirm its integrity. Run verification weekly at minimum. A backup that can't be verified is a backup you can't restore from.

Key Takeaways

Never put ZFS or Ceph behind a hardware RAID controller. Both require raw disk access. Flash the controller to HBA (IT) mode or use a pass through HBA. This is documented explicitly in both the Proxmox and Ceph official docs.
Configure the no-subscription or enterprise repository immediately after install. The default installer points at the enterprise repo and generates apt errors on every update without a subscription key. Pick one and don't mix them.
Set hostnames and static IPs before forming the cluster, and populate /etc/hosts on every node with all cluster members. Corosync uses hostname resolution, and a missing or incorrect entry breaks cluster communication silently.
Corosync requires under 5ms network latency between all nodes. Give it a dedicated physical NIC on an isolated network. Never put Corosync traffic on a network that carries large data transfers, even on a separate VLAN.
Three node Ceph with size=3, min_size=2 survives a single node failure but can't self-heal until the failed node is restored, because there's no fourth node to receive the missing replica. For true self-healing redundancy, five nodes is the production minimum.
One OSD per physical disk. Ceph manages disk level redundancy itself. Multiple OSDs per disk share the disk's throughput and failure domain, which defeats the purpose. Recommended: 8 GB RAM per OSD, one CPU core per OSD daemon.
Don't enable HA without a working fencing mechanism. Without fencing, Proxmox won't restart VMs after an unclean node failure because it can't confirm the failed node has stopped running them. IPMI fencing is the production standard.
Run PBS on a dedicated server or a VM on a separate physical host from the cluster you're backing up. A PBS instance running on the same Ceph pool it's backing up provides no protection against storage failures.