Veeam v13 Sizing and Architecture Patterns

Veeam v13 Sizing and Architecture Patterns

Veeam v13 Series | Component: Full Infrastructure | Audience: Infrastructure Architects, MSP Engineers, Capacity Planners

1. The Sizing Variables

Every Veeam sizing exercise starts with four numbers. Get these wrong and everything downstream is wrong.

Source data size. Total volume of data to be protected. Not disk provisioned. Actual used space. Measure it.

Daily change rate. The percentage of source data that changes between backup runs. 5% is a safe default for general workloads. Database servers can hit 10-20%. File servers with mostly static data may be 1-2%. If you do not know your change rate, run a test backup job for a week and measure the incremental sizes.

Backup window. The time available for the backup job to complete. An 8-hour overnight window is common. Some environments run 24/7 with no defined window, which means you size for continuous operation and accept that backups run alongside production load.

Retention period. How many days, weeks, or months of restore points you need to keep. 14 days is common for operational recovery. 30-90 days for compliance. GFS for long-term (monthly, quarterly, yearly).

2. Proxy Sizing

Proxies do the heavy lifting. They read data from the source (hypervisor, storage, agent) and send it to the repository. Proxy sizing is about CPU cores and throughput per task.

The Formula

Required proxy cores (full) = (Source Data MB / Backup Window Sec) / Throughput per Task MBps Required proxy cores (incremental) = ((Source Data MB * Change Rate) / Backup Window Sec) / Throughput per Task MBps

Throughput per task varies by transport mode. Use these baseline estimates from the Veeam best practice guide:

Transport ModeThroughput per Task (MBps)Notes
Virtual Appliance (Hot-Add)150-180VM proxy on same host/cluster. No network transfer.
Direct SAN (FC/iSCSI)150-200Physical proxy with direct storage connectivity.
NBD over 10 GbE80-100Network-based. Scales with NIC count and ESXi host load.
Object Storage (Direct)100-150Gateway server throughput. Highly dependent on S3 endpoint performance.

Rules of Thumb

1 CPU core per concurrent proxy task. With modern CPUs (2020+), you can push to 2 tasks per core. Start at 1:1 and increase if you observe idle CPU during backup windows.

2 GB RAM per proxy task slot. A proxy with 8 cores and 8 task slots needs 16 GB RAM minimum.

Deploy at least 2 proxies per site. A single proxy is a single point of failure. If the proxy goes down, backups stop. VBR automatically fails over to another proxy if one is unavailable.

Virtual proxies are fine for hot-add mode. Physical proxies are better for direct SAN and large-scale NBD because they avoid contending with production VMs for network and storage I/O.

3. Repository Sizing

Repository sizing has two dimensions: compute (CPU and RAM for data mover operations) and storage (capacity for backup data).

Compute

The repository CPU handles decompression, data writes, and synthetic processing. The best practice guide recommends a 3:1 ratio against proxy task count. If your proxies can run 12 concurrent tasks targeting a single repository, that repository needs at least 4 CPU cores dedicated to backup operations.

RAM: 4 GB per repository core. For ReFS, add 0.5 GB per TB of ReFS volume up to 128 GB total (no need to scale beyond 256 GB).

Repository task slots: 1 task per core. Each concurrent backup stream targeting the repository consumes one task slot. If your proxy count exceeds your repository task capacity, jobs queue. Size the repository to handle the maximum concurrent streams your proxy farm can produce.

Physical vs Virtual

Physical repositories are recommended for production. A virtual repository competes with production VMs for storage and network I/O on the hypervisor. If the hypervisor is the thing you are protecting, hosting the backup target on the same hypervisor means losing the hypervisor loses both production and backups. Virtual hardened repositories are explicitly not recommended for production by the best practice guide.

4. Backup Server Sizing

Environment SizeCPURAMNotes
Under 50 VMs4 cores16 GBCan co-locate proxy and repository roles on same server
50-500 VMs8 cores32 GBDedicated VBR server. Separate proxy and repository servers.
500+ VMs16+ cores64+ GBDedicated VBR. Multiple proxies and repositories. PostgreSQL may benefit from SSD.

The VBR server itself does not move data. It orchestrates jobs, manages the configuration database (PostgreSQL), and coordinates component communication. CPU and RAM requirements scale with the number of concurrent jobs, managed servers, and the size of the configuration database.

Do not exceed 100 concurrent jobs. VBR can handle more, but the sweet spot for database load, load balancing, and overall processing is 80-100 concurrent jobs. If you need more, consider multiple VBR instances managed through VSPC.

5. Network Bandwidth Math

Required bandwidth (Mbps) = (Data to transfer in MB * 8) / Backup Window in Seconds Example: 10 TB incremental data (5% of 200 TB) in an 8-hour window = (10 * 1024 * 1024 * 8) / (8 * 3600) = 83,886,080 / 28,800 = 2,913 Mbps (~3 Gbps)

Apply a 2:1 compression ratio to estimate actual network transfer (VBR compresses data in transit). So the 3 Gbps requirement drops to roughly 1.5 Gbps of actual wire utilization. A dedicated 10 GbE backup VLAN handles this comfortably. A 1 GbE link does not.

For WAN-based backup copy jobs or Cloud Connect replication, bandwidth is usually the bottleneck. Size WAN links based on the daily change data volume, not the total source data. Apply compression and consider WAN accelerators if the link is under 100 Mbps.

6. Storage Capacity Estimation

Required capacity = Source Data * (1 + (Change Rate * Retention Days)) * (1 / Compression Ratio) Example: 100 TB source, 5% daily change, 14-day retention, 2:1 compression = 100 * (1 + (0.05 * 14)) * (1 / 2) = 100 * 1.7 * 0.5 = 85 TB

This is a rough estimate. Actual storage consumption depends on workload type, deduplication effectiveness, synthetic full frequency, and whether you use per-VM backup chains or per-job chains. Always add 20% headroom above the calculated number. Filling a repository above 80% causes performance degradation and can trigger capacity tier offload pressure.

For GFS retention, each weekly, monthly, quarterly, and yearly full adds approximately one full backup worth of storage at the compressed size. A 100 TB environment with 2:1 compression keeping monthly GFS fulls for 12 months adds roughly 600 TB to your capacity requirement.

7. Reference Architecture: SMB (Under 50 VMs)

ComponentSpecification
VBR Server + Proxy + RepositorySingle physical server. 8 cores, 32 GB RAM, 1 GbE + 10 GbE NIC.
StorageDirect-attached RAID array or NAS. XFS with reflink for immutability.
Proxy task slots4-6 concurrent tasks
RepositorySingle hardened repository extent. 15-30 TB usable depending on source data.
NetworkSingle backup VLAN. 1 GbE minimum, 10 GbE recommended.
Offsite copyCloud repository (S3-compatible) or second site via backup copy job.

At this scale, co-locating all roles on a single server is acceptable. The tradeoff is a single point of failure. Mitigate with immutable local storage and an offsite copy. The Veeam Software Appliance (VSA) is a good fit here as an all-in-one deployment.

8. Reference Architecture: Mid-Market (50-500 VMs)

ComponentSpecification
VBR ServerDedicated. 8 cores, 32 GB RAM. VSA or Windows.
Proxies2-4 virtual proxies (hot-add). 4-8 cores each, 16 GB RAM each. Distribute across hosts.
Primary RepositoryPhysical server. 8-16 cores, 64 GB RAM. XFS hardened. DAS or SAN-attached. 50-200 TB usable.
SOBRPerformance tier: primary repository. Capacity tier: S3-compatible object storage for offload.
Mount ServerCo-located with VBR or separate for instant recovery performance.
NetworkDedicated 10 GbE backup VLAN. Separate management and production VLANs.
Veeam ONEDedicated or co-located with VBR. Monitoring and reporting.

This is the most common production pattern. Dedicated VBR server. Multiple proxies for parallel processing and failover. Physical repository with immutability. SOBR with capacity tier offload for long-term retention. 10 GbE backup network. SureBackup running weekly against critical workloads.

9. Reference Architecture: Enterprise/MSP (500+ VMs)

ComponentSpecification
VBR Servers2+ VBR instances (one per site or per customer cluster). HA pair with VSA for primary.
Proxies6-20+ proxies per site. Mix of virtual (hot-add) and physical (direct SAN). Anti-affinity rules to avoid proxy co-location on same host.
RepositoriesMultiple physical repositories per site. SOBR with multiple performance extents. Data Locality placement policy.
Object StorageOn-prem S3 (Object First Ootbi, Cloudian, Scality) for capacity tier. Cloud S3 for archive tier.
Gateway ServersDedicated gateway servers for S3 operations. Size as repository servers (3:1 ratio against proxy tasks).
NetworkDedicated 10/25 GbE backup fabric. VLAN separation per tenant for MSP. QoS for WAN copy jobs.
ManagementVSPC for multi-VBR management. Veeam ONE for monitoring. Enterprise Manager for self-service restore.
ValidationDedicated SureBackup Virtual Lab infrastructure. Recovery Orchestrator for DR drills.

At this scale, architecture decisions compound. Every 10% improvement in proxy efficiency saves hours across the backup window. Every SOBR extent that is undersized becomes a bottleneck. Every missing gateway server slows capacity tier offload. Run a Proof of Concept with real workloads before committing to the production design.

10. SOBR Layout Patterns

Data Locality (Recommended Default)

Keeps all files in a backup chain (full + incrementals) on the same extent. Restores are fast because all data is in one place. The failure domain is a single extent. If an extent goes offline, only the chains stored on that extent are affected. Use this for the vast majority of deployments.

Performance Policy

Distributes data across extents for parallel write performance. Restores may need to read from multiple extents, which adds latency. Data mover communication between extents happens during restores, so inter-extent network bandwidth matters. Use this only when write throughput is the dominant concern and you have high-bandwidth, low-latency connectivity between extents.

Capacity Tier Offload

Configure the capacity tier to offload backup data after a defined number of days on the performance tier. The operational window is the period between the last restore point landing on the performance tier and the offload trigger. Keep at least 7 days on the performance tier for operational restore speed. Offload older data to object storage for cost optimization.

Limit each repository volume to 500 TB. This is not a technical hard limit but a best practice to keep failure domains manageable and operations like SOBR evacuation (for migrations or maintenance) within reasonable time frames. A 500 TB extent at 1 GB/s read throughput takes approximately 6 days to evacuate.

11. Common Sizing Mistakes

Sizing proxies for full backups only. Your first full takes longer than incrementals. But incrementals run every day. Size proxy count for the incremental window. Oversize slightly for the occasional full.

Using a single large proxy instead of multiple smaller ones. One 32-core proxy is a single point of failure. Four 8-core proxies give you the same task capacity with failover and the ability to distribute across hosts and datastores.

Ignoring repository task slots. If your proxies can run 20 concurrent tasks but the repository only has 8 task slots, 12 tasks queue. The proxy farm is idle while the repository is saturated. Match proxy task capacity to repository task capacity.

Running virtual hardened repositories in production. A VM-based hardened repository adds the hypervisor to the attack surface. An attacker who compromises the hypervisor can access the VM's virtual disks regardless of Linux immutability. Physical servers with internal disks or DAS are the correct deployment for hardened repositories.

No headroom on storage. Filling a repository above 80% causes performance degradation. Budget 20% headroom. Budget for growth. Storage that is "right-sized" on day one is undersized in 6 months.

Assuming 1 GbE is sufficient. For anything beyond 10-20 VMs with moderate data volumes, 1 GbE backup networks are a bottleneck. 10 GbE is the minimum recommendation for mid-market and above. 25 GbE is increasingly common at enterprise scale.

Skipping the Proof of Concept. Sizing formulas give you a starting point. Real-world performance depends on storage firmware, RAID configuration, network congestion, ESXi host load, and a dozen other factors that formulas cannot capture. Run a PoC with representative workloads before committing to hardware purchases.

Key Takeaways

  • Four input variables drive everything: source data size, daily change rate, backup window, and retention period. Measure them before sizing anything.
  • Proxy sizing: 1 core per concurrent task (2:1 on modern CPUs), 2 GB RAM per task, minimum 2 proxies per site for failover.
  • Repository sizing: 3:1 ratio against proxy tasks for CPU, 4 GB RAM per core, physical servers recommended for production.
  • Backup server: 80-100 concurrent jobs max per VBR instance. Use VSPC for multi-instance management at scale.
  • Storage capacity formula: Source * (1 + (ChangeRate * RetentionDays)) * (1 / CompressionRatio). Add 20% headroom.
  • SOBR Data Locality is the correct default. Performance policy only when write throughput is the dominant concern.
  • Limit repository volumes to 500 TB to keep failure domains and maintenance operations manageable.
  • 10 GbE backup network is the minimum for mid-market. 1 GbE is only appropriate for very small environments.
  • Always run a Proof of Concept. Formulas are estimates. Real workloads behave differently.

Read more