Disaster Recovery - Redsquid.

Need help with a Disaster Recovery

THE PROJECT

Flight Line Support (FLS) has been growing for several years. The number of on-premise servers they run and the importance of these servers increased greatly. They were concerned that if these servers went down the business would be greatly affected, and they would like to put some Disaster Recovery in place.

Admittedly, they already had backups, but these did not provide agile and quick Disaster Recovery. Fortunately, we already knew the systems at FSL, as we already provided IT support to them. Disaster Recovery is something we had mentioned to them in the past but their increased reliance on these servers put this higher up on their priority list.

The project was to provide offsite replication of all systems and quick recovery time in the event of site disaster or server failure.

What we started with

As with all projects the first stage is to find out the customers goal and then advise if this goal can be improved upon. Therefore, FSL stated that they wanted the quickest possible Recovery time should either the physical server fail, or if there was a complete site Disaster such as fire or theft.

The first stage of such a project is to do due diligence to find out about the data. We need to find out the following:

What do we need to back up?
How much total data is there?
How frequently should we back this up?
Where does it need backing up to?
RPO – What is the recovery point objective?
RTO – What is the recovery time objective?

The research showed the following

What do we need to backup?
There were six virtual machines that needed backing up spread across two physical servers.
How much total data is there?
3TB in total

How frequently should we back this up?
We suggested hourly onsite backups 24 hours a day and nightly offsite backups of all data. FSL agreed.
Where does it need backing up to?
All data needs to be stored offsite in case of an onsite disaster such as fire or theft. This is our suggestion to all customers.

RPO – What is the recovery point objective?

The recovery point object states how far back the backups go and how granular they are. In this case FSL agreed that they would be able to go back as far as 2 months. For granularity, the following was agreed:

Hourly backups are kept from the current day to 14 days
Daily backups are kept from 15 – 28 days
Weekly backups are kept from 29 – 56 days

RTO – What is the recovery time objective?

The recovery time objective states how quickly the systems need to be up and running after they have gone down. Therefore, the following was agreed:

Site failure – 24-hour RTO
Server failure – 4-hour RTO

Other Considerations

Admittedly, with such a backup system it is also important to consider internet speeds. Consequently, offsite replication of such a large amount of data is only possible with a good fibre internet connection. Additionally, FSL have an FTTP (Fibre to the premises) connection with 80MBps down and 40MBps up. This will be more than adequate to replicate the amount and frequency of FSL’s data.

Costs

Once all the due diligence is completed and the details have been agreed upon with the customer it’s time to put a price together. Therefore, the TOD Extra system is a fixed monthly price per server that includes everything. For the TOD Extra service, you get:

Hourly onsite backups 24 hours a day of the entire system.
Nightly offsite backups 7 days a week of the entire system.
Backups are monitored daily and test restores are done weekly by Triumph.
File-level restores upon request.
Access to our redundant physical server in the event of a disaster.
Cloud boot facility in the event of a disaster – We can boot your server/s in our Cloud and provide you remote access to these resources.
All hardware, software and support costs included.FSL agreed the system was exactly what they were looking for and sign off was completed.

PROJECT IMPLEMENTATION

The implementation of this backup system involves the following:

Hardware – Installation of the NAS that will hold all the backups at FSL’s office.
Software – Installation of the backup software on all servers.
Configuration – Configuring the software to the required spec and setting up the offsite replication.
Seeding – The initial data needs to be replicated offsite using physical media.
Testing – Testing the backups and disaster recovery.

Hardware

Software

Seeding

Testing

The NAS box will be installed in a secure location, with a static IP ready to receive the backup images. Importantly, we also ensure that data on the NAS is password protected and encrypted. Therefore, this protects the images from a locker virus should they make their way into the network.

Additionally, the backup agent will need to load onto each server. In order for this to then be configured to send backups to the NAS box hourly 24 hours a day. Therefore, the software uses VSS to capture the entire system even files and software that are in use. The software will automatically notify us of failures, although these are manually checked every day. Importantly, after each installation, an evening reboot of the servers is scheduled.

We will also install our replication agent that manages the replication of the backups to our Cloud network.

Due to the size of the data (3TB). The initial full backups need to be copied to removable physical media and manually copied to our Cloud network. Therefore, we manage this process for the customer and send an engineer to the site once the data has been seeded. Thereafter, the engineer will bring the media back, copy it to our Cloud at which point we can start the replication of the incremental backups.

Once the backups have been fully replicated we will run the following tests:

File restore – Check that we can restore an individual file from every server

Server recovery – Check that we can recovery every server and successful boot it in our cloud.

RTO – Check that we meet the customer’s recovery time object.

All our testing came back perfectly. The backup system is now implemented and tested.

PROJECT SIGN OFF

To summarise, the final stage of any project is to check that the customer is happy. An important aspect of a good IT company is knowing how much technical information the customer wants to know. Some like to know every detail, others like to know “Is it finished?”. FSL sat somewhere in the middle, so we discussed the

Need a comprehensive disaster recovery solution delivered by experts? Get in touch and one of our elite experts will contact you shortly