Veeam Replication in v13: Snapshot-Based VM Replication and Failover
What this covers and where it fits in the RPO story
If you have read the CDP article in this series, you know that CDP is designed for workloads where you cannot afford to lose more than seconds or minutes of data. Snapshot-based replication is the tier below that. It is designed for workloads where an RPO measured in hours is acceptable and where you want the ability to fail over to a running VM quickly rather than waiting through a restore from backup.
Veeam recommends snapshot-based replication for VMs where an RPO of hours is the target. If your recovery point requirement is seconds or minutes, CDP is the right tool. If your recovery requirement is just restoring from backup and RTO is not critical, a backup job and restore process may be all you need. Replication sits in the middle: fast failover, hourly or sub-hourly RPO, without the infrastructure overhead and cost of CDP.
Replication is hypervisor-specific. You replicate VMware VMs to VMware targets, and Hyper-V VMs to Hyper-V targets. Cross-hypervisor replication is not supported with this job type.
How snapshot-based replication works
When a replication job runs for the first time, Veeam creates a full copy of the source VM on the target host. Unlike a backup, the replica is stored in its native format, meaning the virtual disks on the target datastore are full-size VMDK or VHDX files that could power on right now. Subsequent job runs are incremental. Veeam uses Changed Block Tracking (CBT) on VMware or the equivalent on Hyper-V to identify only the blocks that changed since the last run, and writes only those changes to the replica via a snapshot applied to the replica VM.
Each incremental sync creates a restore point on the replica. The restore point is a VMware or Hyper-V snapshot. You can configure Veeam to keep between 1 and 28 restore points. When the number of restore points exceeds the retention setting, the oldest is merged into the base disk. This means the replica always reflects some recent point in time and you have a small window of historical states to roll back to in the event of ransomware or corruption on the source.
Although replica disk data lives on the target datastore, Veeam needs a backup repository to store replica metadata files. This metadata is processed by the source proxy, so the metadata repository should be closer to the source site than the target. Do not use the target site's backup repository for metadata in a two-site DR setup.
Prerequisites and infrastructure requirements
You need source and target hosts registered in VBR, a backup proxy, a backup repository for metadata, and network connectivity between the proxy and both hosts. For VMware environments, both source and target hosts need to be managed by vCenter or be standalone ESXi hosts registered in VBR.
On the target side, make sure the target host has enough datastore capacity to hold the replica VMs at full size. Replicas are not compressed or deduplicated the way backup files are. A 200 GB VM requires 200 GB of datastore space on the target, plus space for any restore point snapshots you configure to keep.
| Requirement | VMware | Hyper-V |
|---|---|---|
| Source host | vCenter or standalone ESXi | Standalone host or cluster |
| Target host | Same platform, same or newer hardware version | Same platform, compatible OS version |
| Restore point storage | Target datastore | Target host path |
| Metadata repository | VBR backup repo, near source | VBR backup repo, near source |
| Unsupported disk types | RDM in physical mode | Shared VHDX, VHDS |
Creating the replication job
Launch the New Replication Job wizard
From the Home view, right-click Jobs, select Replication, then Virtual machine, and choose your platform (VMware vSphere or Microsoft Hyper-V). Give the job a clear name that identifies both the source scope and the target site. Something like Prod-Site-A to DR-Site-B - Tier2 VMs is easier to manage at 2 AM than Replication Job 1.
Add virtual machines
Click Add and select the VMs you want this job to replicate. You can add individual VMs, folders, clusters, or datastores. Adding at the container level picks up new VMs automatically. Keep in mind that every VM you add here requires full datastore space on the target, so scope these jobs deliberately rather than dumping an entire host into a single replication job.
Configure the destination
For VMware: select the target host or cluster, resource pool, VM folder, and target datastore. For Hyper-V: select the target host or cluster and specify the path where replica VHDX files should be stored. Set the replica name suffix in the Job Settings step. The default is _replica appended to the VM name. Do not remove this suffix. Without it, replica VMs on the target look identical to production VMs, which creates operational confusion.
Set restore point retention
In the Job Settings step, set Restore points to keep. For most DR replication jobs, 3 to 7 restore points is the right range. More restore points give you a longer window to roll back if ransomware or corruption is detected, but consume proportionally more snapshot space on the target datastore. Seven restore points with a one-hour RPO gives you a 7-hour rollback window.
Configure data transfer
On the Data Transfer step, leave Direct mode selected unless you have WAN accelerators deployed. If replicating over a WAN link without WAN accelerators, traffic will be compressed by default, which helps. On this step you can also configure network throttling rules specific to this job if you have not set global rules.
Set the schedule
Set the replication interval based on your RPO target. A one-hour interval means your replica will never be more than one hour behind the source. Shorter intervals reduce RPO but increase I/O and network load. Most environments settle on 15-minute to 4-hour intervals depending on the workload tier and available bandwidth.
Optimizing replication over a WAN link
Replication over a WAN requires some planning. The first full replication cycle transfers the entire VM, which for a 200 GB VM over a 100 Mbps link could take several hours. Schedule the initial seed run during a maintenance window or off-hours. Subsequent incremental runs transfer only changed blocks, which are typically a small fraction of the total VM size unless you are replicating a high-write database VM.
Configure network throttling rules in VBR to prevent replication jobs from consuming the full WAN link during business hours. Go to Veeam settings, Network Traffic Rules, and create a rule that restricts bandwidth for the source-to-target IP range during peak hours. This coexists with the backup jobs that might also be crossing that link.
If you have two physical sites connected by a constrained WAN link and you are replicating more than a few VMs, WAN accelerators can significantly reduce traffic. Veeam's WAN accelerator uses a global cache of data blocks on both sides of the link to avoid sending blocks that have already been transferred. The trade-off is that each WAN accelerator node requires its own cache storage.
Failing over to a replica
When you need to use the replica, go to the Home view in VBR, expand the Replicas node, find the VM, right-click it, and select Failover Now. You can choose to fail over to the latest restore point or roll back to a specific point in time from the available restore points on the replica. Veeam powers on the replica VM on the target host and the VM takes over the role of the source.
While running on the replica, changes continue to accumulate in the replica's state. Veeam tracks this divergence. The replica is not syncing back to the source while failover is active.
If the failover is going to be long-lived, you can perform a Permanent Failover, which commits the current replica state as the new production VM and removes the replication relationship. Use this when the original site is unrecoverable and you are designating the DR site as the new primary.
Failing back to the original site
When the source site is repaired and ready to accept workloads again, right-click the replica in the Replicas view and select Failback to production. Veeam syncs the changes that accumulated during failover back to the source VM, then switches traffic back. You can choose between failing back to the original VM or to a new VM on the source site if the original needs to be replaced.
After failback completes, the replica resumes its normal standby state and the replication job continues from where it left off. Veeam handles the re-sync automatically.
Replication from backup
An often-overlooked option is replication from backup. Instead of taking a live snapshot from the source VM to feed the replication process, Veeam reads from an existing backup file in a repository. This is useful for remote office VMs where you want to minimize production impact and the WAN link is constrained. The source VM only needs one snapshot for the initial backup job. The replication job reads from that file rather than hitting the live VM again.
The trade-off is that your replication RPO cannot be better than your backup frequency. If backups run once a day, your replica will be at most 24 hours behind. For non-critical VMs where 24-hour RPO is acceptable, this approach gives you a standby replica without any additional load on the source environment.
To use this, enable the Read the entire restore point from backup when possible option in the Data Transfer step of the replication job wizard and select the backup repository containing the source VM backups.
- Replication job running on a scheduled interval, feeding up-to-date replicas to the target site
- Restore point retention sized for your rollback window requirements
- WAN throttling configured to protect business-hours bandwidth
- Failover and failback procedures understood and ready to execute
Snapshot-based replication is the practical middle ground for most environments. It is not as aggressive as CDP on RPO, but it does not require the I/O filter infrastructure that CDP needs and it works across both VMware and Hyper-V. If you have Tier 1 workloads on CDP and Tier 2 workloads on replication, you have a tiered DR approach that matches protection cost to business criticality, which is exactly what you should be building toward.