Veeam v13: Ransomware Recovery, From Detection to Production
Veeam v13 Series | Component: VBR v13 | Audience: Enterprise Architects, Security and Compliance Teams, Hands-on Sysadmins
The malware detection article in this series covers how Veeam finds the threat and how you investigate it. This article picks up where that one ends. You've got a verified clean restore point. You know what was hit. The infection vector is closed, or at least contained. Now you have to actually get your workloads back to production in the right order, using the right method, without bringing anything back that shouldn't come back.
That's harder than it sounds. The restore method you pick determines your RTO. The order you recover in determines whether anything comes back at all. Secure Restore scans the data during the restore process, but there are things it can't scan and you need to know what they are. And when it's all done, the documentation you produce is the difference between having evidence for an audit and having a story.
This article covers all of it end to end.
1. Before You Restore Anything
Two things have to be true before you touch the restore wizard. The infection vector is closed or isolated. And you know which restore point you're recovering from.
Restoring before the vector is closed is how you restore infected data and re-encrypt immediately. It's one of the most common ways that ransomware recoveries fail. It doesn't matter how clean your restore point is if the path that got the attacker in is still open. Endpoint detection, firewall rules, credential rotation, Active Directory cleanup: whatever applies to your incident happens in parallel with your Veeam investigation, not after.
On the restore point: the malware detection article covers how to find your last clean point using Get-VBRObjectRestorePoint filtered by malware status. If you haven't done that investigation yet, do it before proceeding. The restore point you choose determines everything downstream. Guessing is not an acceptable approach at this stage.
2. Choosing the Right Restore Method
Veeam gives you several ways to bring a VM back. They're not interchangeable. The right choice depends on how fast you need the VM available, whether you can absorb degraded performance during recovery, and whether you need to run scripts against the VM data before it reaches production.
| Method | How It Works | Best For in Ransomware Recovery | Trade off |
|---|---|---|---|
| Instant Recovery | VM runs directly from the backup file on a vPower NFS datastore. No data movement upfront. Fast online time. | Getting services back quickly while full restore runs in background. Tier 1 workloads with tight RTO. | Degraded I/O performance while running from backup. Not a permanent state. |
| Full VM Restore | Entire VM data written to production storage before powering on. VM runs from production storage from the start. | When you can afford the restore time and want clean production performance from first boot. | Longer time to online. Better permanent state than Instant Recovery. |
| Staged Restore | VM booted in isolated virtual lab, script runs inside it, then recovered to production. Requires a preconfigured virtual lab. | When you need to execute a cleanup or sanitization script (AD cleanup, credential rotation, sensitive data removal) before the VM reaches production. | Additional complexity. Virtual lab required. Longer total process. |
For most ransomware recoveries, the sequence is: Instant Recovery to isolated network to validate, then Quick Migration (migration to production storage while VM runs) if you need the VM available immediately, or full restore if you have time. Staged Restore makes sense when you need to run remediation scripts before production, like scrubbing an AD environment of compromised accounts before reconnecting it.
3. Secure Restore: What It Does and What It Misses
Secure Restore is a scan that runs during the restore process itself. You enable it in the restore wizard at the Secure Restore step. Veeam mounts the machine disks on the mount server and scans them using Veeam Threat Hunter (signature based), your own antivirus software configured via AntivirusInfos.xml, or YARA rules before completing the restore. If something is found, the restore either aborts or continues with restrictions depending on what you configured.
It supports Windows and Linux machines. Disks or volumes that can't be mounted to the mount server are skipped from the scan and restored without AV scanning. Storage Spaces disks and ReFS volumes on a mount server OS that doesn't support ReFS are examples of what gets skipped. Know what's in your environment before you rely on Secure Restore as a blanket guarantee.
How to Enable It
In the restore wizard, at the Secure Restore step, enable "Scan restore points with the existing antivirus software" and configure what happens if malware is found. You have two options: abort the restore entirely, or continue the restore and mark the machine as infected so the result is logged. In a ransomware recovery scenario, abort is the right default. Continuing with a known infected restore point to production defeats the purpose.
The antivirus scan requires AV software installed and running on the mount server. Veeam reads scan settings from AntivirusInfos.xml, which lives at %ProgramFiles%\Common Files\Veeam\Backup and Replication\Mount Service on Windows, or /var/lib/veeam/mount/AntivirusInfos.xml on Linux mount servers. The file ships with predefined settings for common AV solutions. If your AV isn't in the default list, you can add it to the XML.
What Secure Restore Doesn't Catch
Secure Restore scans files on the mounted disks using your AV engine and YARA rules. What it can't catch:
- Malware that your AV signatures don't know about yet. If the variant that hit you is newer than your last signature update, the scan passes and the threat is still there. Keep signatures current on your mount server, and combine Secure Restore with Veeam Threat Hunter which uses its own maintained signature database independently of your AV.
- Malware that lives only in memory or registry at runtime, not on disk. Secure Restore scans the filesystem. Memory resident threats don't show up in a disk scan.
- Data that's already been encrypted by the time the restore point was taken. If you chose a restore point from within the infection window, Secure Restore scanning it doesn't un-encrypt the files. It just scans for the malware that encrypted them. The files are still encrypted.
- Volumes that can't be mounted to the mount server, as covered above.
4. The Isolated Network Restore Pattern
Don't restore ransomware impacted VMs directly to production. Restore to an isolated network first, validate everything is working and clean, then cut over. This is the pattern that catches the cases where Secure Restore passed but the restored VM still isn't behaving correctly.
Setting Up the Isolated Target
You need an isolated portgroup in vSphere with no routing to production and no internet access. This is the same portgroup your SureBackup virtual lab uses. If SureBackup is configured, that infrastructure already exists. If it isn't, create a new distributed portgroup or standard portgroup with no uplinks and no IP routes out of the segment.
DNS matters here. If your restored VMs try to reach production domain controllers and can't, they'll behave unpredictably. Two options: deploy a lightweight DNS server in the isolated segment, or adjust the VMs' DNS settings via guest OS customization during the restore.
Instant Recovery to Isolated Network
- Home view, right click the backup job with the VM you're recovering, select Instant Recovery, then To VMware vSphere.
- Select the restore point you've already identified and verified as clean.
- On the Destination step, choose the target host and datastore. Put this on non-production storage.
- On the Network step, disconnect the VM from its production portgroup and connect it to your isolated portgroup instead. This is the critical step. Verify it before proceeding.
- Enable Secure Restore on the Secure Restore step. Select Veeam Threat Hunter and your AV engine. Set behavior to abort if malware is found.
- Complete the wizard. The VM boots in the isolated segment and is available within minutes.
Now validate before you touch anything else. Can the OS boot? Do services start? Does the application respond on its configured port? Run your verification checklist against every VM before you declare it ready for production.
Staged Restore: When You Need to Run Cleanup Scripts First
Staged Restore lets you run an executable script for VMs before recovering them to the production environment. It's part of the full VM restore operation. You configure it in the Full VM Restore wizard, select Staged Restore mode, and point it at a script on the backup server. Veeam boots the VM in your virtual lab, runs the script inside the VM, then continues the restore to production if the script exits with code 0.
Common use cases in ransomware recovery:
- Removing compromised service accounts or local admin accounts that the attacker created before the restore point was taken
- Rotating passwords or resetting credentials that may have been captured before the incident
- Removing persistence mechanisms like scheduled tasks, startup entries, or registry run keys that were planted during the dwell period
- Scrubbing sensitive data from a VM before it goes to a less trusted network segment
The script runs with the credentials you provide in the staged restore configuration. All VMs in a single staged restore session must run the same OS type (all Windows or all Linux), and the script must reside in a local folder on the backup server.
5. Recovery Order: Tier by Tier
Nothing in a ransomware recovery comes back in isolation. Your domain controllers have to be healthy before anything else authenticates. Your DNS has to be working before your applications can find each other. Your core databases have to be online before your application tier makes any sense. Getting the order wrong means spending time bringing things back that immediately fail because their dependencies aren't there yet.
This is the order that works. It maps directly to the workload priority matrix from the DR runbook article.
| Wave | Workloads | Validate Before Proceeding |
|---|---|---|
| Wave 1 | Domain controllers, DNS servers, core network infrastructure | AD replication healthy. DNS resolving internal and external. Authentication working from a test client. |
| Wave 2 | Core databases (SQL, Oracle), primary file servers, certificate services | Database services started and accepting connections. Application logs clean. No replication errors. |
| Wave 3 | Application tier VMs that depend on Wave 2 | Application up and user accessible. Core business functions working end to end. |
| Wave 4 | Secondary systems, reporting, monitoring | Functional. Not blocking Wave 3. |
| Wave 5 | Dev, test, non-production | Last. These wait until production is stable. |
Don't skip the validation gates. It's tempting to declare Wave 1 done and immediately start Wave 2 to compress the timeline. That's how you discover three hours later that your DCs are partially replicated and your Wave 2 VMs are throwing authentication errors that look like a second infection but are actually a replication lag you didn't wait to resolve.
6. Cutting Over to Production
You've been running on the isolated network. Validation passed. Now you need to move from isolated to production. How you do this depends on which restore method you used.
If You Used Instant Recovery: Quick Migration
With Instant Recovery, VMs run from compressed and deduplicated backup files on a vPower NFS datastore. Performance is degraded compared to running from production storage. Quick Migration moves the VM data from the backup datastore to production storage while the VM stays running, then swaps the network connection to production at the right moment.
- In the VBR console, Home view, select Instant Recovery Sessions. Right click the session for the VM you want to migrate and select Quick Migration.
- Select the production datastore as the migration target.
- On the network step, remap the VM from the isolated portgroup to its production portgroup. This is the production cutover moment. Make sure you've done your validation before reaching this step.
- Quick Migration runs a two phase process: it recovers the VM from backup to production storage, then moves all changes accumulated while running from backup and consolidates them. The VM stays running throughout.
- After migration completes, the VM is running from production storage on the production network. The Instant Recovery session ends automatically.
If You Used Full Restore to Isolated Network
The VM is already on production storage, just connected to the wrong network. The cutover is a single network change in vSphere: update the VM's network adapter to the production portgroup. Update DNS if necessary. Notify dependent systems and stakeholders that the VM is back on production.
Communicate Before You Cut Over
Stakeholders who have been waiting for systems to come back will be watching. People who aren't expecting a cutover will get surprised by it. A brief message 5 minutes before you flip the network and immediately after you confirm the VM is on production is not a lot of effort and it prevents a flood of "is it done?" messages during the most sensitive moment of the recovery.
7. PowerShell: Automating Secure Restore and Instant Recovery
When you're recovering multiple VMs under pressure, doing this through the wizard for each one doesn't scale. These scripts handle the core recovery operations.
Instant Recovery with Secure Restore to Isolated Network
param(
[Parameter(Mandatory)][string]$VMName,
[Parameter(Mandatory)][string]$RestorePointDate,
[Parameter(Mandatory)][string]$TargetHost,
[Parameter(Mandatory)][string]$TargetDatastore,
[Parameter(Mandatory)][string]$IsolatedPortgroup,
[string]$VBRServer = "vbr-server.domain.local"
)
Connect-VBRServer -Server $VBRServer
# Get the backup containing this VM
$backup = Get-VBRBackup | Where-Object {
(Get-VBRBackupObject -Backup $_).Name -contains $VMName
} | Select-Object -First 1
if ($null -eq $backup) {
Write-Error "No backup found for VM '$VMName'"
Disconnect-VBRServer
exit 1
}
# Get the specific restore point
$targetDate = [DateTime]::Parse($RestorePointDate)
$restorePoint = Get-VBRObjectRestorePoint -Name $VMName -Backup $backup |
Where-Object { $_.CreationTime.Date -eq $targetDate.Date } |
Sort-Object CreationTime -Descending |
Select-Object -First 1
if ($null -eq $restorePoint) {
Write-Error "No restore point found for $VMName on $RestorePointDate"
Disconnect-VBRServer
exit 1
}
Write-Host "Restore point: $($restorePoint.CreationTime) | Status: $($restorePoint.MalwareStatus)"
# Get target infrastructure objects
$server = Get-VBRServer | Select-Object -First 1
$esxHost = Find-VBRViEntity -Name $TargetHost -Server $server
$datastore = Find-VBRViEntity -Name $TargetDatastore -Server $server
if ($null -eq $esxHost -or $null -eq $datastore) {
Write-Error "Could not find target host or datastore. Check the names and try again."
Disconnect-VBRServer
exit 1
}
Write-Host "Starting Instant Recovery for $VMName to isolated network..."
Write-Host " Target host: $TargetHost"
Write-Host " Target datastore: $TargetDatastore"
Write-Host " Network: $IsolatedPortgroup (ISOLATED)"
Write-Host " Secure Restore: AV scan enabled, abort on detection"
# Start-VBRInstantRecovery with Secure Restore params
# -EnableAntivirusScan requires AV software on the mount server
# -VirusDetectionAction: Abort (stop restore) or DisableNetwork (continue but isolate)
$session = Start-VBRInstantRecovery `
-RestorePoint $restorePoint `
-Server $esxHost `
-Datastore $datastore `
-NicsEnabled $true `
-EnableAntivirusScan `
-VirusDetectionAction Abort
Write-Host "Instant Recovery session started: $($session.Id)"
Write-Host ""
Write-Host "NEXT STEPS:"
Write-Host " 1. Verify the VM network adapter is connected to: $IsolatedPortgroup"
Write-Host " 2. Validate OS boot, services, and application health"
Write-Host " 3. When validation passes, run Quick Migration to production storage"
Write-Host " 4. Remap network to production portgroup during Quick Migration"
Disconnect-VBRServer
Full VM Restore with Secure Restore
param(
[Parameter(Mandatory)][string]$VMName,
[Parameter(Mandatory)][string]$RestorePointDate,
[Parameter(Mandatory)][string]$TargetHost,
[Parameter(Mandatory)][string]$TargetDatastore,
[string]$VBRServer = "vbr-server.domain.local"
)
Connect-VBRServer -Server $VBRServer
$backup = Get-VBRBackup | Where-Object {
(Get-VBRBackupObject -Backup $_).Name -contains $VMName
} | Select-Object -First 1
if ($null -eq $backup) {
Write-Error "No backup found for '$VMName'"
Disconnect-VBRServer
exit 1
}
$targetDate = [DateTime]::Parse($RestorePointDate)
$restorePoint = Get-VBRObjectRestorePoint -Name $VMName -Backup $backup |
Where-Object { $_.CreationTime.Date -eq $targetDate.Date } |
Sort-Object CreationTime -Descending |
Select-Object -First 1
if ($null -eq $restorePoint) {
Write-Error "No restore point found for $VMName on $RestorePointDate"
Disconnect-VBRServer
exit 1
}
Write-Host "Starting Full VM Restore for $VMName"
Write-Host " Restore point: $($restorePoint.CreationTime)"
Write-Host " Malware status: $($restorePoint.MalwareStatus)"
Write-Host " Secure Restore: ENABLED (Threat Hunter + AV, abort on detection)"
$server = Get-VBRServer | Select-Object -First 1
$esxHost = Find-VBRViEntity -Name $TargetHost -Server $server
$datastore = Find-VBRViEntity -Name $TargetDatastore -Server $server
# Start full VM restore with Secure Restore options
# Full VM restore with Secure Restore is configured in the VBR console wizard
# (Home > Restore > Virtual Machines > Full VM Restore > Secure Restore step).
# The PowerShell path for full VM restore with secure restore uses the same
# Start-VBRInstantRecovery approach with -EnableAntivirusScan, followed by
# Quick Migration to production storage once the VM is validated.
#
# For a scripted full restore without Instant Recovery as an intermediate step,
# use the VBR REST API (port 9419) which exposes full restore operations
# with secure restore options that the PowerShell module does not cover in one cmdlet.
Write-Host "For full VM restore with Secure Restore via PowerShell:"
Write-Host " 1. Use Start-VBRInstantRecovery with -EnableAntivirusScan (see previous script)"
Write-Host " 2. Validate the VM in the isolated environment"
Write-Host " 3. Use Start-VBRQuickMigration to migrate to production storage"
Write-Host ""
Write-Host "Restore point identified: $($restorePoint.CreationTime)"
Write-Host "Status: $($restorePoint.MalwareStatus)"
Write-Host "Proceed with Instant Recovery script using this restore point."
Disconnect-VBRServer
Batch Recovery: Multiple VMs in Priority Order
# Define your recovery waves. Each wave waits for manual confirmation before starting the next.
# RestorePointDate should be the date of the last verified clean restore point.
$recoveryWaves = @(
@{
Wave = 1
Name = "Infrastructure"
VMs = @("dc01", "dc02", "dns01")
RestorePointDate = "2026-03-10"
},
@{
Wave = 2
Name = "Core Databases and File Services"
VMs = @("sql01", "sql02", "fileserver01")
RestorePointDate = "2026-03-10"
},
@{
Wave = 3
Name = "Application Tier"
VMs = @("appserver01", "appserver02", "webserver01")
RestorePointDate = "2026-03-10"
}
)
$VBRServer = "vbr-server.domain.local"
$TargetHost = "esxi-recovery.domain.local"
$TargetDatastore = "Datastore-Recovery"
$IsolatedPortgroup = "vlan999-isolated"
Connect-VBRServer -Server $VBRServer
$server = Get-VBRServer | Select-Object -First 1
foreach ($wave in $recoveryWaves) {
Write-Host ""
Write-Host "====================================="
Write-Host "WAVE $($wave.Wave): $($wave.Name)"
Write-Host "VMs: $($wave.VMs -join ', ')"
Write-Host "====================================="
$confirm = Read-Host "Press ENTER to start Wave $($wave.Wave), or type SKIP to skip this wave"
if ($confirm -eq "SKIP") {
Write-Host "Skipping Wave $($wave.Wave)"
continue
}
$targetDate = [DateTime]::Parse($wave.RestorePointDate)
$esxHost = Find-VBRViEntity -Name $TargetHost -Server $server
$datastore = Find-VBRViEntity -Name $TargetDatastore -Server $server
$waveSessions = @()
foreach ($vmName in $wave.VMs) {
$backup = Get-VBRBackup | Where-Object {
(Get-VBRBackupObject -Backup $_).Name -contains $vmName
} | Select-Object -First 1
if ($null -eq $backup) {
Write-Host " WARNING: No backup found for $vmName -- skipping"
continue
}
$rp = Get-VBRObjectRestorePoint -Name $vmName -Backup $backup |
Where-Object { $_.CreationTime.Date -eq $targetDate.Date } |
Sort-Object CreationTime -Descending | Select-Object -First 1
if ($null -eq $rp) {
Write-Host " WARNING: No restore point found for $vmName on $($wave.RestorePointDate) -- skipping"
continue
}
Write-Host " Starting Instant Recovery: $vmName ($($rp.CreationTime))"
$session = Start-VBRInstantRecovery `
-RestorePoint $rp `
-Server $esxHost `
-Datastore $datastore `
-NicsEnabled $true `
-EnableAntivirusScan `
-VirusDetectionAction Abort
$waveSessions += [PSCustomObject]@{
VMName = $vmName
SessionId = $session.Id
StartTime = Get-Date
}
Write-Host " Session: $($session.Id)"
}
Write-Host ""
Write-Host "Wave $($wave.Wave) sessions started: $($waveSessions.Count) VMs"
Write-Host "Verify all VMs in this wave before proceeding to Wave $($wave.Wave + 1)"
Write-Host ""
# Export this wave's session IDs for the incident record
$waveSessions | Export-Csv `
"C:\IncidentReports\Wave$($wave.Wave)-Sessions-$(Get-Date -Format 'yyyyMMdd-HHmm').csv" `
-NoTypeInformation
}
Disconnect-VBRServer
Write-Host "All waves initiated. Review session logs and validate each wave before production cutover."
8. Documenting the Recovery for Audit and Incident Record
What you do here determines whether you have evidence or a story. An auditor, a cyber insurer, or a customer asking for proof of what happened needs documents, not your recollection of events.
What to Capture During Recovery
- Every restore session ID, the VM it covers, the restore point date used, and the timestamp it started and completed. The batch recovery script above exports these automatically per wave.
- Secure Restore results for every VM. The session statistics in VBR show whether the AV scan passed, what was scanned, and what was found. Export these from Home view, Last 24 Hours, filter to restore sessions.
- Every validation check you performed in the isolated environment before production cutover, with timestamps and who performed the check.
- The timestamp and method of each production cutover (Quick Migration, network change, etc.).
- Any VMs that failed Secure Restore and what you did about them.
Export Restore Session Results via PowerShell
param(
[string]$VBRServer = "vbr-server.domain.local",
[string]$OutputFolder = "C:\IncidentReports",
[int]$HoursBack = 72
)
Connect-VBRServer -Server $VBRServer
$cutoff = (Get-Date).AddHours(-$HoursBack)
# Get all restore sessions in the incident window
$sessions = Get-VBRRestoreSession |
Where-Object { $_.CreationTime -gt $cutoff } |
Sort-Object CreationTime
$report = foreach ($session in $sessions) {
[PSCustomObject]@{
SessionId = $session.Id
JobType = $session.JobTypeString
VMName = $session.Name
StartTime = $session.CreationTime
EndTime = $session.EndTime
DurationMinutes = [math]::Round(($session.EndTime - $session.CreationTime).TotalMinutes, 1)
Result = $session.Result
State = $session.State
Details = $session.Details
}
}
$timestamp = Get-Date -Format "yyyyMMdd-HHmm"
$outputFile = Join-Path $OutputFolder "RestoreSessions-Incident-$timestamp.csv"
if (-not (Test-Path $OutputFolder)) {
New-Item -ItemType Directory -Path $OutputFolder | Out-Null
}
$report | Export-Csv -Path $outputFile -NoTypeInformation
Write-Host "Exported $($report.Count) restore sessions to: $outputFile"
Write-Host ""
Write-Host "Summary:"
$report | Group-Object Result | ForEach-Object {
Write-Host " $($_.Name): $($_.Count)"
}
Disconnect-VBRServer
Lessons Learned: The Document That Actually Gets Used
One week after production is stable, hold a post-incident review and produce a lessons learned document. This doesn't have to be long. It needs to answer four questions:
- What was the initial access vector and how has it been closed?
- How long was the attacker in the environment before detection (dwell time)?
- What Veeam controls worked as expected, what controls were missing, and what needs to change?
- What changed in the DR runbook as a result of what you learned during the recovery?
The lessons learned document is also what feeds your immutability window decision the next time around. If the attacker dwelled for 18 days before triggering, a 7 day immutability window wouldn't have saved you. That number should now be 30 days minimum. You only know that because you documented it.
Key Takeaways
- Don't restore before the infection vector is closed. It doesn't matter how clean the restore point is if the path in is still open.
- Restore to an isolated network first. Validate. Then cut over to production. The extra step costs 30 to 60 minutes. Restoring from the wrong point to production can cost the entire recovery timeline.
- Instant Recovery gets services up fast but runs from backup storage with degraded I/O. It's not a permanent state. Follow it with Quick Migration to production storage.
- Secure Restore scans during the restore process using Threat Hunter, your AV, and YARA. It skips volumes it can't mount (Storage Spaces, ReFS on incompatible mount servers). Run both Threat Hunter and your AV together, not just one.
- Staged Restore lets you inject scripts into the recovery process before a VM hits production. Use it when you need to remove compromised accounts, rotate credentials, or strip persistence mechanisms from the restore point data.
- Recovery order matters. Domain controllers and DNS come back first. Nothing else works correctly until they do. Validate each wave before starting the next one.
- Document every session ID, every Secure Restore result, every validation check, and every production cutover timestamp. That documentation is your incident record. An auditor needs documents, not your recollection.
- The lessons learned document updates your immutability window, your runbook, and your detection thresholds. If you don't write it, you don't get the improvement.