Break Glass #08: Repository Out of Space Mid-Job: Recovering and Getting Backups Running Again

Share
Break Glass // Scenario 08
The repository ran out of space while a job was running. Backup jobs are failing. Some jobs got partway through a write before they hit the wall. The chain state on disk may not match what VBR recorded. You need to free space, repair the state, and get jobs running again without making the chain situation worse.
Break Glass VBR v13 Repository Space Recovery

Why This Happens

Repositories fill up for predictable reasons that compound quietly. Data growth from the protected environment is the slow burn. VMs get bigger, more VMs get added to jobs, change rates climb, and nobody adjusts retention or storage capacity to keep pace. The sharp trigger is usually a synthetic full transformation. Veeam needs temporary space to write the new full before it can delete the old one. Veeam best practices recommend sizing your repository to hold at least 1.25 times the size of a full backup as additional headroom beyond your normal backup data. On a nearly full repository, there is no room for the transformation and the job fails partway through the write.

Retention enforcement fires after the backup completes, not before. Veeam writes the new restore point first, then applies retention to delete the oldest ones. On a full repository the new write fails because there is nowhere to put it. Jobs start failing and the oldest restore points do not get cleaned up because that cleanup never gets a chance to run.

Someone deleting files directly from the repository filesystem outside of VBR creates a different kind of problem. The OS level free space returns but VBR still thinks those files exist. Jobs fail with "file does not exist" errors because VBR tries to reference a chain that is no longer complete on disk.

The SOBR has an additional behavior to know about. VBR uses an estimated free space calculation to prevent concurrent jobs from racing to fill an extent simultaneously. By default, VBR only refreshes that estimate when no tasks are assigned to an extent. In a busy SOBR there is almost always a task running, so the cached estimate drifts away from actual free space. Jobs then fail with "no scale-out repository extents have sufficient disk space" even though extents are not actually full. This is an estimation issue, not genuine out of space, and the fix is different.

Triage

  1. 1Confirm actual free space at the OS level. RDP or SSH to the repository host and check the filesystem directly:
    df -h /path/to/backup/directory
    On Windows, check drive Properties. Compare this to what VBR shows in Backup Infrastructure, Backup Repositories. A mismatch between OS reported space and VBR reported space points at an estimation issue.
  2. 2Disable all jobs targeting the full repository. Do not let another job attempt to write while you are working on the space issue. It will fail again and may break a chain that is currently still intact.
  3. 3Look for partially written files on the repository host. VBK or VIB files with very recent modification timestamps and unusually small file sizes compared to your typical incremental or full backup sizes. Note them. Do not delete them yet.
  4. 4Run a Rescan on the repository. Right click the repository in Backup Infrastructure and select Rescan. After the rescan, check Home, Backups for grayed out or unavailable restore points. These indicate VBR is aware that something is missing or incomplete.
  5. 5For a SOBR reporting "no scale-out repository extents have sufficient disk space" when OS level free space is available: this is the estimation issue. The fix is different from a genuine out of space situation. See the Gotchas section.

The Recovery Path

  1. 1Free space through the VBR console. Go to Home, Backups, and right click the oldest backup chains. Select Delete from Disk. VBR removes the files from disk and updates its database in a single coordinated operation. This is the correct way to free space. Do not delete files directly from the filesystem.
  2. 2If files were already deleted directly from the filesystem outside of VBR: run a Rescan on the repository. After the rescan, unavailable restore points appear grayed out in VBR. Right click an unavailable restore point in the backup Properties and choose Forget to remove the VBR database record while leaving any remaining files untouched, or Remove to remove both the database record and any remaining physical files. Use Forget when you are not certain and want to preserve what is there. Use Remove when you are sure the chain is broken beyond recovery.
  3. 3Free enough space to comfortably accommodate the next job run. The general guidance from Veeam best practices is to maintain free space equal to at least 1.25 times the size of a full backup. Freeing just enough for one incremental is not enough. A synthetic full transformation will fail again if there is no headroom beyond the normal write.
  4. 4Run a Rescan on the repository after freeing space. This updates VBR's free space calculation and clears any stale estimates.
  5. 5Review the job retention settings before turning the jobs back on. Open each disabled job, check the Storage tab, and confirm the restore points value makes sense for the storage you have. If you have been running 30 restore points on a 2 TB repository and data growth hit the wall, reducing retention or adding storage capacity are both valid options. But pick one before bringing the jobs back online.
  6. 6For the SOBR estimation issue described in KB2282: add the following registry value on the VBR server and restart the Veeam Backup Service:
    Key: HKLM\SOFTWARE\Veeam\Veeam Backup and Replication\ Value: SobrForceExtentSpaceUpdate Type: DWORD (32-bit) Data: 1
    The default behavior (Data: 0) only refreshes the cached free space when no tasks are assigned to an extent. Setting Data: 1 enables periodic recalculation of estimated free space while tasks are active, which is what fixes the drift. Per KB2282, this should only be enabled where the SOBR is configured to use Per-Machine Backup Files. Per-Machine is the SOBR default, but verify your SOBR is set that way before applying the key.
  7. 7Turn jobs back on one at a time. Start with the most critical and watch the first run complete successfully before enabling the next.
  8. 8Enable the backup file health check on any job that had an interrupted write. In job Properties, Storage, Advanced, on the Maintenance tab, in the Storage-level corruption guard section, check Perform backup files health check and click Configure to set the schedule. This verifies the chain you are now building from is intact before you rely on it for a restore.

Gotchas

Never Delete Backup Files Directly from the Filesystem
Deleting VBK, VIB, or VBM files directly from the repository filesystem creates a split state where the files are gone from disk but VBR still tracks them in its database. Jobs then fail with "file does not exist" or "backup files are unavailable" because the chain record points to files that no longer exist. Always delete through the VBR console using Delete from Disk so the database stays in sync. If files were already deleted directly, run Rescan and then use Forget or Remove on the resulting unavailable restore points to clean up the database.
Veeam Writes First Then Applies Retention
Retention enforcement fires at the end of a successful job run, after the new backup file is written. On a full repository, the new write fails before retention gets to run. Retention does not proactively free space before attempting the write. It cannot. The fix is always to free space manually through the VBR console, then let the next job run succeed, at which point retention fires normally and reclaims the oldest restore points.
Immutable Repositories Block Emergency Deletion
On a Hardened Linux Repository with immutability enabled, you cannot delete backup files until their immutability window expires. The chattr +i flag blocks deletion at the OS level and VBR will not override it. This is the intended behavior of immutability, but it means a full immutable repository has very limited emergency space recovery options. Your choices are adding a new storage extent to the SOBR, waiting for the immutability window to expire on the oldest files, or restoring from your backup copy job while you sort out the storage. This constraint has to be designed for during initial sizing. Not worked around during an incident.
SOBR Estimation Issue vs Genuine Out of Space
The SOBR error "no scale-out repository extents have sufficient disk space to store the backup file" looks like a genuine full storage condition but is sometimes a VBR estimation issue where the calculated free space per extent drifts from actual available space. Always confirm with an OS level check before treating this as real. If df or drive Properties shows plenty of free space but VBR disagrees, apply the SobrForceExtentSpaceUpdate registry value from KB2282. If both the OS and VBR agree that space is genuinely gone, free space through the VBR console.
Partial Synthetic Full. Let VBR Handle It
If a synthetic full transformation was interrupted partway through, the partially written VBK file still exists on disk. VBR tracks this as an incomplete restore point. On the next job run, VBR attempts to handle the incomplete transformation. Do not manually delete the partial VBK file. Let VBR manage it. If the job keeps failing with chain errors on subsequent runs, enable the Storage-level corruption guard and let the health check assess and repair the chain. If repair fails, run an active full backup to start a clean chain.

Prevention Checklist

  • Configure free space alerting in VBR's repository settings. In the repository properties, VBR has a built in option to alert when free space falls below a threshold. Set this and make sure the notifications are reaching someone. Alert at 20 percent remaining, not 5 percent.
  • Use Veeam ONE for capacity planning. The Capacity Planning for Backup Repositories report shows projected days remaining before each repository runs out of space, based on your current growth trend. This gives you time to act before jobs start failing.
  • Size repositories to hold your backup data plus 25 percent additional headroom for synthetic full transformations. A repository sized to hold exactly your current backup data has no room to breathe.
  • For SOBRs, add a capacity tier using object storage. Moving older restore points to object storage frees performance tier space for active chains without deleting data.
  • Never delete backup files directly from the repository filesystem. Put this in your team's runbook as a hard rule. The correct path is always through the VBR console.
  • Enable Storage-level corruption guard on all jobs. An out of space event that interrupts a write partway through a chain is exactly the situation this feature is designed to detect and recover from.
Break Glass Recap
  • Disable all jobs targeting the full repository before doing anything else
  • Confirm OS level free space first. SOBR estimation drift can look like genuine out of space
  • Free space through VBR console only. Never by deleting files directly from the filesystem
  • Files already deleted outside VBR: Rescan, then Forget (keeps files) or Remove (deletes files)
  • Veeam writes first then applies retention. You must free space manually before the next run succeeds
  • Immutable repo: cannot force delete during the immutability window. Add storage or wait
  • SOBR estimation drift: SobrForceExtentSpaceUpdate DWORD = 1, restart Veeam Backup Service. Per KB2282, only enable when SOBR uses Per-Machine Backup Files (the SOBR default)
  • Free 1.25 times full backup size worth of headroom, not just enough for one incremental
  • Partial VBK from interrupted synthetic full: let VBR handle it on next run, do not manually delete
  • Enable health check on jobs that had interrupted writes before relying on that chain for a restore

Read more