Backup Verification and Recovery Testing

February 13, 2026 | DR Backup Reliability

Automated verification and quarterly drills.

Backup Verification and Recovery Testing

A backup that hasn't been tested is not a backup — it's a hope. Organizations that don't regularly verify their backups discover corruption, missing data, or broken restore processes during the worst possible moment: an actual disaster. This guide covers automated verification and quarterly recovery drills.

The Backup Verification Problem

  • Silent corruption — Backup files may be incomplete or corrupted without any indication
  • Schema drift — Backup format may not be compatible with current software versions
  • Missing data — Backup scope may not include all critical databases or file systems
  • Untested procedures — The team may not know how to restore, or procedures may be outdated

Automated Backup Verification

Database Backup Verification

# Automated RDS snapshot verification (Lambda function)
import boto3
import time

def verify_rds_backup(event, context):
    rds = boto3.client('rds')

    # Restore latest snapshot to a temporary instance
    snapshots = rds.describe_db_snapshots(
        DBInstanceIdentifier='prod-db',
        SnapshotType='automated'
    )['DBSnapshots']

    latest = sorted(snapshots, key=lambda x: x['SnapshotCreateTime'])[-1]

    rds.restore_db_instance_from_db_snapshot(
        DBInstanceIdentifier='backup-verify-temp',
        DBSnapshotIdentifier=latest['DBSnapshotIdentifier'],
        DBInstanceClass='db.t3.medium'
    )

    # Wait for instance to be available
    waiter = rds.get_waiter('db_instance_available')
    waiter.wait(DBInstanceIdentifier='backup-verify-temp')

    # Run verification queries
    verify_data_integrity('backup-verify-temp')

    # Clean up
    rds.delete_db_instance(
        DBInstanceIdentifier='backup-verify-temp',
        SkipFinalSnapshot=True
    )

S3 Backup Integrity Check

# Verify S3 backup files
def verify_s3_backups():
    s3 = boto3.client('s3')

    # Check that backup files exist for today
    today = datetime.now().strftime('%Y/%m/%d')
    objects = s3.list_objects_v2(
        Bucket='backups',
        Prefix=f'database/{today}'
    )

    if objects['KeyCount'] == 0:
        alert("No backup files found for today!")
        return False

    # Verify file sizes (catch truncated backups)
    for obj in objects['Contents']:
        if obj['Size'] < 1000:  # Suspiciously small
            alert(f"Backup file {obj['Key']} is only {obj['Size']} bytes")
            return False

    # Verify checksum
    for obj in objects['Contents']:
        head = s3.head_object(Bucket='backups', Key=obj['Key'])
        if 'x-amz-checksum-sha256' not in head:
            alert(f"Missing checksum for {obj['Key']}")

    return True

Quarterly Recovery Drill

Schedule a full recovery drill every quarter:

Drill Plan Template

  1. Scope definition — Which systems are being tested?
  2. Recovery scenario — What failure are we simulating? (AZ failure, database corruption, ransomware)
  3. Success criteria — What must work for the drill to pass?
  4. Timeline — Expected recovery time for each step
  5. Participants — Who needs to be involved?

Drill Execution

StepActivityExpected TimeActual Time
1Identify latest backup5 min___
2Provision recovery infrastructure15 min___
3Restore database from backup30 min___
4Deploy application to recovery infra15 min___
5Verify application functionality20 min___
6Verify data integrity15 min___
7DNS cutover (simulated)5 min___

AWS Backup for Centralized Management

aws backup create-backup-plan --backup-plan '{
  "BackupPlanName": "ProductionBackup",
  "Rules": [{
    "RuleName": "DailyBackup",
    "ScheduleExpression": "cron(0 5 ? * * *)",
    "TargetBackupVaultName": "production-vault",
    "Lifecycle": {
      "MoveToColdStorageAfterDays": 30,
      "DeleteAfterDays": 365
    },
    "CopyActions": [{
      "DestinationBackupVaultArn": "arn:aws:backup:eu-west-1:xxx:backup-vault:dr-vault",
      "Lifecycle": {
        "DeleteAfterDays": 90
      }
    }]
  }]
}'

Verification Metrics Dashboard

  • Backup completion rate — Target: 100%
  • Last successful verification — Should be within 7 days
  • Recovery Time (actual) — Track trend over drills
  • Recovery Point (actual) — Verify backup freshness
  • Drill results — Pass/fail history

Common Drill Findings

  • Credentials expired — Backup service account passwords rotated but not updated
  • Missing database — New database added to production but not to backup policy
  • Slow restore — Large database restore takes 4 hours, not the expected 1 hour
  • Missing runbook steps — Team discovers gaps in recovery documentation
  • Version mismatch — Backup was taken on v14, but restore target is v15

Eazy SaaS Tip: We automate backup verification for every client using Lambda functions that restore and validate backups nightly. Combined with quarterly recovery drills, our clients have verified, tested backups at all times — not just backup files that they hope work.