Migrating Massive VMs: Overcoming Cloud Ingestion Bottlenecks

veeam - cloud migration - aws - azure - oci - rto - ingestion

What this covers

The real friction in large VM migrations to AWS, Azure, and OCI is rarely the tool. It is the pipe. This article covers the physics of the bottleneck, how each cloud provider's ingestion architecture creates its own specific constraints, the Veeam-centric workflows for each, and the architectural patterns that keep RTOs defensible when the network is the limiting factor.

🧮 The math nobody runs until it is too late

Before any tooling conversation, run the transfer math. This is where most migrations go sideways. The team focuses on which cloud, which landing zone, which migration service, and completely forgets to ask: how long does it actually take to move this data over the wire we have?

10TB over a 1Gbps internet connection (at ~60% usable throughput)~37 hours
10TB over a 10Gbps Direct Connect / ExpressRoute / FastConnect~2.5 hours
100TB over a 10Gbps dedicated circuit~22 hours
100TB over 100Mbps internet~100+ days
1PB over a 500Mbps dedicated connection~8 months

Those numbers are from AWS and Oracle's own documentation. The 100-day figure for 100TB over 100Mbps is not a theoretical edge case. It is a real scenario that gets hit by organizations on commodity internet connections who assumed the migration would "just happen" over a few weekends. The 8-month figure for 1PB over 500Mbps comes directly from AWS Snowball documentation explaining why physical seeding exists at all.

The first thing you do before any cloud migration project involving serious data volume is calculate your migration window against your available bandwidth. If that math does not work, no tool, no proxy optimization, and no clever scheduling fixes it. You need more pipe or a different seeding strategy before the project starts.

⚠️ Where the actual bottleneck lives

Cloud migration bottlenecks tend to show up in one of four places, and diagnosing the wrong one wastes significant time. Veeam's own bottleneck detection framework identifies source, proxy, network, and target as the four candidates. For cloud migrations, network and target are almost always the primary constraints. But the failure modes within each are different.

The network bottleneck is the one people think about: not enough bandwidth. But there is a second, less obvious network constraint: latency-induced throughput collapse. TCP window sizes and protocol overhead mean that high-latency links cannot saturate even moderate bandwidth. A 10Gbps Direct Connect with 20ms RTT delivers less usable throughput for sequential large transfers than the bandwidth number suggests. Protocol-level optimizations matter here, specifically enabling jumbo frames on Direct Connect and ExpressRoute connections, which reduces packet overhead and improves throughput for large sequential writes.

The target bottleneck on cloud migrations is usually the cloud provider's ingestion pipeline, not your network. AWS EC2 import, Azure Migrate's replication appliance, and OCI's import service all have processing constraints that cap how fast they can consume incoming data, regardless of how fast you can push it. You can saturate your own upload pipe and still be queued on the provider side. Understanding the per-provider ingestion architecture changes how you design the migration.

🔧 How Veeam's cloud migration path works

Veeam's primary migration mechanism to public cloud is the Direct Restore path: take an existing on-premises VM backup and restore it directly to EC2, Azure VM, or (for OCI, using agent-based workflows) to OCI compute. The backup is the migration artifact. You are not running a separate migration tool alongside your backup stack. This matters operationally: your migration run is also your last backup. If the migration fails mid-transfer, you still have the backup. The restored VM in the cloud is your production system from the moment you fail over DNS.

For AWS and Azure, Veeam deploys a cloud-side proxy appliance as part of the restore process. This proxy appliance runs inside your target cloud account and handles the local disk-side operations: receiving the backup stream, converting it to the native cloud disk format (EBS for AWS, managed disks for Azure), and attaching it to the new instance. The proxy appliance is the key to understanding why the Veeam cloud restore path outperforms raw VM import workflows. The conversion and disk attachment happen in-cloud, not on the wire, and the backup data travels in Veeam's compressed, deduplicated format until it lands at the proxy.

For OCI, there is no native Veeam OCI plugin in the current release. The path for OCI migrations uses Veeam Agents in managed mode, with OCI Object Storage now supporting the S3-compatible API that Veeam's object storage repositories target. Oracle Cloud VMware Solution (OCVS) gets full Veeam support because it is a VMware SDDC inside OCI: VBR can reach it natively and the standard vSphere backup and restore paths apply.

☁️ AWS: Direct Connect, DataSync, and the seeding problem

🔶 Amazon Web Services

AWS provides two primary dedicated connectivity options for migration workloads. Direct Connect is the private, dedicated physical connection from your data center into the AWS network. Hosted connections start at 50Mbps and scale to 10Gbps. Dedicated connections are available at 1Gbps, 10Gbps, and 100Gbps. Inbound data transfer over Direct Connect is free. The port-hour charge is what you pay for. For a migration project, this means the economics strongly favor getting Direct Connect provisioned before you start moving data: you pay for the port regardless of whether you are using it, and you pay nothing extra for the data volume you push through it.

The key operational point on Direct Connect for migrations: plan at least 90 days lead time for new circuit provisioning through a Direct Connect partner. If your migration starts in 90 days, the circuit ordering needs to start today. Organizations regularly discover this constraint after the migration timeline is already committed.

The Veeam EC2 restore path

Veeam's Restore to Amazon EC2 workflow deploys a proxy appliance inside your AWS account. The appliance handles the local operations: receiving the backup data stream from your on-premises VBR, writing it to a temporary EBS volume, and attaching that volume to the new EC2 instance. When the proxy appliance is used, the backup data does not stage through S3. Without the proxy appliance, Veeam writes backup data to a temporary S3 bucket in RAW format and then imports from S3 to EBS via AWS VM Import/Export. The proxy path is faster and cleaner for large disks: S3 staging adds a second write cycle and the import queue can be a constraint.

The proxy appliance subnet has specific requirements. It must have automatic public IP assignment and a route table with a default route to an internet gateway. If your backup data is stored on an on-premises object storage repository, the proxy appliance also needs connectivity back to that repository, which typically means VPN or Direct Connect. Size the proxy appliance with at least 1GB of RAM per VM disk being migrated concurrently.

When the pipe is not enough: AWS DataSync and offline seeding

AWS DataSync is the online transfer service for large dataset migration. A single DataSync agent can saturate a 10Gbps network link. DataSync handles encryption in transit, integrity checking, and automated scheduling. For migration workloads where the data lives on NFS or SMB shares (not VM disk images), DataSync is the right path. For VM disk image migration, Veeam's Direct Restore path is more appropriate because DataSync is file-oriented and does not handle the format conversion required to land a VM into EC2.

For very large migrations where even Direct Connect cannot deliver the data within the project window, AWS offered Snowball Edge for offline physical seeding. Important current status: as of November 2025, AWS Snowball Edge is no longer available to new customers. Existing customers can continue using it. New customers should use DataSync for online transfers or AWS Data Transfer Terminal for secure physical transfers. This is a meaningful change for migration planning: if you were counting on Snowball Edge as part of your seeding strategy and are not already a Snowball customer, that path is closed. DataSync with Direct Connect is the current recommended large-scale migration path for new AWS customers.

🔵 Azure: ExpressRoute, the process server, and scale-out

🔷 Microsoft Azure

Azure's dedicated connectivity is ExpressRoute. Circuit speeds range from 50Mbps up to 100Gbps depending on the provider and peering location. Like Direct Connect, data inbound to Azure over ExpressRoute does not carry data transfer charges. The gateway SKU matters for throughput: the ErGw1AZ, ErGw2AZ, and ErGw3AZ SKUs support 1Gbps, 2Gbps, and 10Gbps respectively. The ultra-performance gateway supports up to 10Gbps. The older Standard, HighPerformance, and UltraPerformance SKUs have lower throughput ceilings and Microsoft is pushing customers to migrate to the AZ-enabled SKUs. If you are running a legacy gateway SKU and hitting throughput walls during migration, the gateway itself may be the constraint, not the circuit.

Azure Migrate and the process server bottleneck

Azure Migrate's agent-based migration path uses a replication appliance that combines a configuration server and a process server. The process server compresses and encrypts the replication stream before pushing it to Azure. A single process server has real throughput limits. Microsoft's documentation is clear: if you are replicating and migrating hundreds of servers, a single process server will not handle the traffic. For large-scale migrations involving many VMs or very large individual VMs, you need to scale out the process server tier by deploying additional process servers and distributing VMs across them.

Each source VM's Mobility service communicates with the replication appliance over TCP 443 for management and sends replication data to the appliance on TCP 9443. That 9443 port is inbound to the appliance. In high-volume migrations with many concurrent replications, the appliance's network interface and local storage become the bottleneck before the WAN link does. The fix is scale-out: more process servers, not a fatter ExpressRoute circuit.

The Veeam Azure restore path

Veeam's Restore to Azure works through Azure restore proxy appliances. These are Windows-based VMs inside your Azure tenant that receive the backup stream and transfer disk data to Azure Blob storage. The proxy appliance should be in the same Azure region as the restore target. Deploying one that is geographically close but in the wrong region is a common mistake that adds latency to every disk write during the restore. The proxy halves restore time in practice: Veeam's own documented test showed a restoration that took 80 minutes without the proxy completed in 37 minutes with it, at double the throughput.

For large VMs with multiple disks, the proxy's memory allocation becomes important. The appliance needs enough RAM to handle all disks being migrated concurrently. If the proxy is undersized, the restore will fail when it runs out of memory during disk staging. Right-size the proxy before starting a large migration, not after the first failure.

🔴 OCI: FastConnect, DRG routing, and the OCVS path

► Oracle Cloud Infrastructure

OCI's dedicated connectivity service is FastConnect. Port speeds are available at 1Gbps, 10Gbps, 20Gbps, and 50Gbps depending on region and partner. FastConnect is available through over 100 global partner providers. Unlike AWS Direct Connect, OCI does not charge for data transfer inbound over FastConnect at all: Oracle charges only for port hours. For large migrations, this makes FastConnect economically clean: your migration cost is the port-hour charge plus whatever your connectivity partner charges, with no variable per-GB ingest fee regardless of volume.

The DRG (Dynamic Routing Gateway) is OCI's central routing hub for connecting FastConnect circuits to VCNs. All traffic from your on-premises network traverses the DRG before reaching resources in your VCN. A DRG supports up to 300 attachments, so multi-VCN migrations are well within its architectural capacity. Oracle recommends using redundant FastConnect circuits for production workloads, with Site-to-Site VPN as backup. For migration workloads specifically, the VPN-only path is viable for smaller datasets but will be the bottleneck for large migrations: IPSec VPN over internet provides best-effort throughput, while FastConnect provides deterministic, consistent performance.

OCVS: the fastest Veeam path to OCI

If your target is Oracle Cloud VMware Solution, the migration story is straightforward. OCVS is a fully supported VMware SDDC running on OCI bare metal. Veeam sees it as a standard vSphere environment. VMware HCX and vMotion are the recommended migration tools for moving workloads into OCVS. For backup and recovery of those workloads once they land in OCVS, standard VBR applies with all features intact. This is the cleanest migration path for organizations that want to lift VMware workloads to OCI without re-platforming them.

For non-OCVS OCI workloads, Veeam uses Agents in managed mode. OCI Object Storage supporting the S3-compatible API means it can function as a Veeam object storage repository target. Backups written there from on-premises can be used as the source for in-cloud restores. Oracle's Zero Downtime Migration (ZDM) service handles database migrations specifically and is the recommended path for Oracle Database workloads regardless of how the application tier migrates.

🌱 The seeding strategy: doing the initial transfer right

For migrations above a few terabytes, the standard migration pattern is two-phase: seed the initial bulk copy, then switch to delta replication for the final cutover. Getting the seeding phase right is what keeps the cutover window short.

The seeding phase is the initial full copy of the VM disk to the cloud. Everything that happens after that is incremental. If the seed is clean and complete, the delta replication that runs up to cutover is small: only changed blocks since the last sync. If the seed is slow, interrupted, or partial, the delta grows and the cutover window stretches. A 10TB VM that changes at 5% per day generates 500GB of delta per day. If your seed took two weeks because of bandwidth constraints, you have potentially 7TB of delta to replay before you can cut over. This is the math that creates extended migration windows for large workloads.

The correct approach for large seedings is to isolate the seed job from production backup windows. Give it dedicated bandwidth and run it during a sustained window where you can monitor throughput consistently. For Veeam specifically, the initial EC2 restore or Azure restore job is effectively your seed. It runs once, gets the VM to the cloud, and from there incremental backup copy jobs keep the cloud copy in sync until cutover. The cloud-side copy is your DR target during the transition period and your production system after cutover.

Backup copy job as pre-staging

A useful pattern for very large VMs: use a Veeam Backup Copy Job to an object storage repository (S3, Azure Blob, or OCI Object Storage) as the pre-staging step before the final restore. This separates the bulk data movement from the actual restore conversion. The backup copy runs continuously in the background, keeping the cloud-side copy of the backup up to date. When you are ready to cut over, the restore job runs from data that is already in the cloud, not from data that needs to traverse the WAN in real time. The cutover transfer is only the delta since the last backup copy sync, which can be minutes of change rather than days.

📶 Cloud gateway and proxy optimization

The cloud-side proxy or gateway appliance is the single most impactful optimization point in a Veeam cloud migration. It is also the most commonly undersized component. Several tuning decisions directly affect throughput.

For AWS, the proxy appliance should be an instance type with EBS-optimized I/O and sufficient network bandwidth for the expected ingest rate. Instance types like m5.xlarge or c5.large give you enough CPU for decompression and enough network to keep up with a 10Gbps Direct Connect circuit. For Azure, the proxy appliance should be in the same region as the restore target. Use a VM size with premium SSD support for the proxy's temporary disk: standard HDD-backed proxy appliances create a local storage bottleneck during disk staging.

Jumbo frames deserve attention on both Direct Connect and ExpressRoute. Both services support jumbo frames (9001 bytes MTU for Direct Connect, 9000 bytes for ExpressRoute). Enabling jumbo frames reduces the TCP/IP overhead per byte transferred and meaningfully improves throughput on large sequential transfers. This requires configuration on both your on-premises routing equipment and the virtual network gateway in the cloud. It is not enabled by default on most configurations and is consistently overlooked.

Veeam's WAN acceleration is applicable in scenarios where you have a WAN accelerator deployed on the cloud side. For direct internet migrations or VPN-based migrations where bandwidth is the hard constraint, WAN acceleration reduces the effective data volume by deduplicating the stream at the source before it traverses the link. For migrations where the cloud proxy handles the final data, WAN acceleration between VBR and the proxy is less impactful because the Veeam backup format is already compressed and deduplicated. The bigger win is right-sizing the proxy and getting off public internet onto a dedicated circuit.

⏱️ Keeping RTO defensible during cutover

The cutover moment is where migrations either go smoothly or produce a war room. Three things determine whether your RTO holds up: how current is the cloud-side data at the time of cutover, how fast can the restored VM reach a functional state, and what is your rollback plan if the cloud VM does not come up cleanly.

Data currency is controlled by how recently your last backup copy sync ran. If you are using the backup copy pre-staging pattern described above, the delta at cutover can be very small. Schedule one final incremental sync to the cloud immediately before starting the cutover procedure. This minimizes the amount of new data to process during the final restore.

Veeam's Instant Recovery to cloud allows you to start the cloud VM directly from backup storage before the restore is complete. For AWS and Azure this means the VM is running while the remaining disk data is being hydrated in the background. RTOs measured in minutes rather than hours are achievable for workloads where you need the service available while the full restore finishes. For production cutover, verify the VM is healthy and the full restore is complete before decommissioning the on-premises instance.

Rollback planning is the piece teams skip because they are optimistic about the migration succeeding. The on-premises VM should remain intact and powered off until the cloud instance is verified stable under production load. Do not decommission or deallocate the source VM at the moment of cutover. Keep it for at least 48-72 hours post-cutover. For very large VMs where storage costs are a concern, snapshot the source VM rather than leaving it running. The point is to preserve the rollback option until you are confident the cloud instance is solid.

📋 Provider comparison: migration connectivity at a glance

Attribute AWS Azure OCI
Dedicated connectivity service Direct Connect ExpressRoute FastConnect
Max dedicated bandwidth 100Gbps 100Gbps (ErGwScale) 50Gbps (partner, varies by region)
Inbound data transfer cost Free over Direct Connect Free over ExpressRoute Free over FastConnect
Circuit provisioning lead time ~90 days for new circuits Provider-dependent Provider-dependent
Veeam cloud proxy support Yes - EC2 proxy appliance Yes - Azure restore proxy Via Agents / OCVS
Native Veeam plugin Yes Yes No native plugin
VMware workload path Direct Restore to EC2 Direct Restore to Azure / AVS OCVS with HCX/vMotion
Physical offline seeding Existing customers only (Snowball discontinued for new customers Nov 2025) Azure Data Box No native equivalent
Jumbo frame support 9001 bytes MTU 9000 bytes MTU 9000 bytes MTU
The core pattern that works Start with bandwidth math. Get dedicated connectivity provisioned before the migration timeline commits. Use Veeam's cloud proxy appliances, right-sized for your VM count and disk volume. Pre-stage backup data to cloud object storage via backup copy jobs before the cutover window. Enable jumbo frames on your dedicated circuit. Keep the source VM intact until the cloud instance is proven stable under load. The ingestion pipe is always the constraint: everything else is optimization within the constraint.

Read more