Back to Blog

Setting Up a Homelab for Disaster Recovery Testing

How I created a controlled environment to test backup and recovery solutions for enterprise systems.

Setting Up a Homelab for Disaster Recovery Testing

Disaster recovery is one of those critical IT disciplines that you hope you never need to use. But when you do need it, you better have tested it thoroughly. That's why I decided to set up a dedicated homelab environment for disaster recovery testing.

Why a Dedicated DR Testing Environment?

In enterprise environments, disaster recovery plans are often developed but rarely tested completely. There are good reasons for this:

  • Testing can disrupt production systems
  • It's difficult to simulate true disaster scenarios
  • Full-scale tests require significant resources
  • There's always risk in executing recovery procedures

My goal was to create a controlled environment where I could test various disaster scenarios and recovery procedures without affecting production systems.

The Hardware Setup

For my homelab DR testing environment, I repurposed some older enterprise equipment:

  • 2x Dell PowerEdge R720 servers (32GB RAM, 8 cores each)
  • 1x Dell PowerConnect 5548 switch
  • 1x Synology NAS for shared storage
  • Various network interfaces for creating isolated networks

This gave me enough computational power to simulate a small enterprise environment while staying within a reasonable power budget for a home setup.

Virtualization Platform

I chose VMware ESXi as my hypervisor for a few reasons:

  1. It's widely used in enterprise environments
  2. The free version provides enough functionality for testing
  3. It has excellent snapshot and cloning capabilities
  4. vSphere Replication and other DR tools are available

Setting up the hosts with shared storage on the Synology NAS allowed me to test vMotion and failover scenarios as well.

Simulated Infrastructure

Within this environment, I set up a typical three-tier application architecture:

  • Web servers (NGINX)
  • Application servers (mix of Windows and Linux)
  • Database servers (SQL Server, MySQL)
  • Domain controllers
  • File servers

I also created realistic network segmentation with VLANs to simulate a production environment more accurately.

Backup and Recovery Solutions Tested

With the infrastructure in place, I've been able to test several backup and recovery solutions:

  1. Veeam Backup & Replication: Excellent for VMware environments, with features like Instant VM Recovery
  2. Windows Server Backup: Basic but effective for Windows servers
  3. Rsync and Bash scripts: Simple but powerful tools for Linux systems
  4. Database-specific solutions: SQL Server AlwaysOn Availability Groups, MySQL replication

Disaster Scenarios Tested

I've been methodically working through various disaster scenarios:

  1. Server hardware failure: Simulated by shutting down hosts or removing network connectivity
  2. Storage array failure: Tested by disconnecting shared storage
  3. Network outage: Created by reconfiguring switch ports
  4. Ransomware attack: Simulated by running scripts that modify files similar to ransomware
  5. Database corruption: Deliberately corrupting database files
  6. Complete site failure: Taking down all primary systems at once

Measuring Recovery Metrics

For each scenario, I've been measuring important metrics:

  • Recovery Time Objective (RTO): How long it takes to restore service
  • Recovery Point Objective (RPO): How much data is lost
  • Recovery Accuracy: Whether all systems and data are properly restored
  • Procedural Clarity: How well-documented and clear the recovery steps are

These metrics help identify where the recovery process needs improvement.

Key Learnings So Far

After several months of testing, I've discovered several important lessons:

  1. Documentation is critical: Unclear procedures dramatically increase recovery time
  2. Dependencies matter: Understanding the order of restoration is vital
  3. Automation helps: Scripted recovery procedures reduce human error
  4. Regular testing is necessary: Recovery procedures that worked last quarter might fail today
  5. Practice builds confidence: The team gets better with each test

Next Steps

I'm continuing to enhance my DR testing environment by:

  • Adding more realistic data loads
  • Creating automated testing scenarios
  • Implementing more complex application architectures
  • Testing cloud-based disaster recovery solutions
  • Developing better monitoring for the recovery process

Having a dedicated environment for disaster recovery testing has been invaluable for building confidence in our recovery procedures. When a real disaster strikes, we'll be ready.