Break Glass #10: Backup Chain Corruption: Recovering a Broken Forward Incremental Chain in Veeam v13
Why This Happens
A forward incremental chain is a sequence of files: one full backup (VBK) followed by one or more incrementals (VIB). Each incremental depends on the file before it. Break any link in that chain and every restore point after the break becomes inaccessible. The full and everything before the break may still be fine.
Storage hardware failure partway through a write is the most direct cause. A disk fails, a RAID array degrades, the repository host crashes while a VIB file is being written. The resulting file is truncated or corrupt. The health check finds it on the next run. Without health check enabled, the corruption sits silently until someone tries to restore from a point past the break.
Retention database issues are a subtler cause. During a retention enforcement pass, if the VBR server or database is interrupted partway through an operation, the database record for a chain can end up in an inconsistent state. The files on disk are fine but VBR thinks incrementals exist without a valid full at the start of the chain. Jobs fail with errors like "Full Storage Not Found" or complaints about corrupted storage metadata. These errors point to a database inconsistency, not storage corruption, and Veeam Support can often resolve the issue without data loss.
Manual file deletion is the third cause. Someone deletes a VBK or VIB file directly from the repository filesystem. VBR still tracks the chain in its database. Restore attempts fail because the file no longer exists. Jobs also fail because VBR finds the chain is incomplete before writing a new incremental.
Forever incremental chains amplify the blast radius. A chain with no periodic synthetic fulls that runs for months has a very long dependency chain. A single corrupted incremental six months in breaks every restore point after that date. Periodic synthetic fulls divide the chain into separate sections and limit the corruption blast radius to the section containing the corrupted file.
Triage
- 1Open the VBR console. Go to Home, Backups, Disk. Find the backup for the affected job. Right click the backup and select Properties. Look at the restore point list. Unavailable or grayed out restore points have missing or corrupt backing files. Note the date of the first unavailable restore point. Everything after that date in the same section of the chain is also broken.
- 2Rescan the repository. Right click the repository in Backup Infrastructure and select Rescan. After the rescan, refresh the backup properties view. Rescanning sometimes resolves apparent corruption that was a database sync issue rather than actual file damage.
- 3Check whether the backup files physically exist on the repository host. SSH or RDP to the repo and navigate to the backup directory. Confirm the VBK and VIB files listed in VBR are actually present on disk. Missing files confirm physical deletion or storage failure. Present files with VBR showing them as unavailable suggests a database issue.
- 4Check the job session log for the error message type. "Backup files are unavailable" or "File does not exist" points to missing files. Errors about corrupted storage metadata or "Full Storage Not Found" point to a database side inconsistency. These have different recovery approaches.
- 5Determine what restore points you can still recover from. The VBK full and all VIB files before the corrupted one are typically still good. In the Properties window, attempt a test restore from the last available restore point before the break. If that succeeds, note the date. That is your current recovery window.
Recovery Path A: Files Missing from Disk
- 1In VBR, open the backup Properties. Right click the first unavailable restore point. You have two options: Forget removes the record from the VBR database but leaves any remaining files on disk untouched. Choose this if you are unsure and want to investigate further. Remove from disk removes the record and deletes the backing files from disk. Choose this only when you are certain the chain from that point forward is not recoverable.
- 2When prompted, select "This and dependent backups" to remove the broken restore point and every restore point that depends on it. Selecting "All unavailable backups" removes every unavailable restore point across the entire backup, which may be more than you intend.
- 3After the unavailable restore points are removed, run the backup job manually. VBR should pick up the chain from the last intact restore point and continue incrementally. Watch the session log to confirm it runs without errors.
- 4If the last intact restore point is the VBK full itself, and the chain forward from there is broken, VBR will start a new incremental chain from the existing full. The full restore point remains valid and accessible.
- 5If the VBK full itself is missing or corrupt, the entire chain is unrecoverable. Go to Recovery Path D.
Recovery Path B: Database Inconsistency (Files Exist, Metadata Corrupt)
- 1Do not delete anything yet. Errors like "Full Storage Not Found" or complaints about corrupted storage metadata, where the files are physically present on disk, are usually a VBR database issue rather than storage corruption. Veeam Support can typically resolve this without data loss by correcting the database records. Open a support case and export logs per KB1832 (Help, Support Information in the VBR console, then walk through the Export Logs wizard).
- 2While waiting for support, create a parallel backup job for the same VMs targeting a different repository. This ensures you have a fresh, clean chain building in parallel. Do not delete the original chain until support has assessed whether it is recoverable.
- 3If you cannot wait for support and need the job running, rescan the repository, then use Forget on the unavailable restore points to clear the broken database records. VBR will resume the job from the last intact point. The files on disk remain untouched. Forget only removes database records. If support later determines the chain is recoverable, the files are still there.
Recovery Path C: Health Check Detected Corruption in a VIB File
Veeam's storage level corruption guard (health check) detected corrupted data during a scheduled health check pass. VBR completes the backup job with the Error status and starts a health check retry process to rebuild the chain.
- 1Let the health check retry complete. The retry starts as a separate backup job session. Its behavior depends on where the corruption was found. If corrupted metadata was found in an incremental, VBR removes records of that incremental and every subsequent incremental from the configuration database, then transports new incremental data relative to the latest valid restore point and writes a new incremental file. If corrupted data blocks were found in a full or incremental file, VBR marks the affected restore point and subsequent points as corrupted and transports data blocks from the source datastore during the retry. Either way, the next time you look at the chain, it should be healthy from the retry point forward.
- 2Note: on Hardened Linux Repositories, health check detection works but automatic repair does not. The official Veeam documentation states: "Linux immutable repositories do not support repair. If the health check detects corrupted data, Veeam Backup and Replication marks the restore point as corrupted in the configuration database and finishes the health check session." You must run an active full backup to start a new chain. If you do not, every subsequent incremental will complete with Error status.
- 3After the retry completes, check the session log and confirm the job session shows Success or Warning with no further corruption messages. Verify that restores from the repaired chain work correctly before trusting it for production recovery.
- 4Investigate the root cause of the corruption. A single corrupted VIB is a warning sign. If storage level issues caused it, expect more corruption. Check the repository host's storage health: SMART data for drives, RAID controller event log, and filesystem integrity.
Recovery Path D: Entire Chain Unrecoverable
The VBK is missing, corrupt, or the metadata is so broken that VBR cannot reconstruct a usable chain. All restore points from this chain are lost.
- 1Check for alternate restore sources before accepting total loss: backup copy job target, SOBR capacity tier, replication target, tape archive.
- 2Remove the broken chain from VBR. In Home, Backups, Disk, right click the broken backup and select Remove from disk. This clears the database records and removes the physical files. If the physical files are already gone, use Forget instead.
- 3Enable the job and run it. VBR will create a new active full on the next run, starting a fresh chain. The first run will take significantly longer than normal. It is a full backup, not an incremental.
- 4Investigate the storage failure that caused the VBK corruption before the new chain runs. Starting a new chain on a storage system that is actively degrading will reproduce the corruption.
Gotchas
Prevention Checklist
- Enable storage level corruption guard (backup file health check) on every job. Schedule it to run weekly. This is the difference between catching corruption during a health check and discovering it during a production restore.
- Configure periodic synthetic or active fulls on all jobs. Weekly is the standard. Forever incremental chains have unbounded blast radius when corruption hits.
- Run SureBackup jobs against your most critical VMs. SureBackup is the only way to verify that the application inside the backup is actually recoverable, not just that the backup files are intact.
- For immutable repositories, run a backup copy job to a separate target. When health check detects corruption on an immutable repo, you need an alternate recovery path that does not require the immutability window to expire.
- Keep the repository storage healthy. Monitor SMART data on drives, RAID controller event logs, and repository host system logs. Chain corruption is usually a symptom of a storage health problem that will repeat.
- Never delete backup files directly from the repository filesystem. Always use the VBR console.
- Open backup Properties in VBR to see which restore points are unavailable and where the break is
- Rescan the repository first. Some apparent corruption resolves as a sync issue
- Files missing: Forget (keep files) or Remove from disk (delete files) on unavailable restore points
- Menu option is "This and dependent backups" to remove only the broken section, or "All unavailable backups" for everything
- Files present but metadata corrupt: open a Veeam support case before deleting anything
- Health check detected corruption: let the health check retry complete. On Linux immutable repos, run an active full instead
- Entire chain gone: check backup copy and capacity tier before accepting total loss
- Forget only removes the database record. Files stay on disk
- Remove from disk deletes both database record and files. Irreversible
- Immutable repos: only an active full on a new chain (or waiting for the immutability window to expire) is an option
- Forever incremental chains have unbounded blast radius. Use periodic synthetic fulls