Making a Backup and Disaster Recovery system robust, secure and ready for new challenges.
Replibit is a backup & recovery system based on Linux, ZFS (backups), KVM (virtualization) and Python & Angular (backend and web interface). Replibit creates backups from Windows systems and stores multiple copies in both on- and off-site servers so they can be recovered at any time.
Replibit's architecture is quite clever and includes three main components: Agent, Appliance, and Vault. Agent is a small program that is installed on every Windows computer requiring backup. The program automatically generates incremental backup images from the Windows system using Shadow Copy. These backup files are then immediately transferred over the local network to Appliance, a Ubuntu server that allows system administrators to manage backups, mount them as devices, and even initiate a virtualized windows system from any of the backups. As a last resort, backups also get stored on an off-site Ubuntu server called Vault, ensuring that all backup files are accessible even after a complete disaster at the customer's location.
What we've done
When we began this project, we first had to go back into the already exisiting code, understand what it was doing and document it correctly before moving on to developing new features and functionality for the system.
As a first step, we had to figure out how the existing code worked. To accomplish this, we installed the system locally and monitored the browser's network activity to discover backend endpoints.
On the Linux side, we had to find out what application was listening at port 80 and from there, go all the way up the ZFS commands called from Python to see which lines of code actually performed the file system operations on the server.
We were able to piece together a general picture of how the system was structured which allowed us to move forward with added functionality.
An independent security consultancy found a few security issues that needed to be fixed. Since Replibit software updates itself, introducing a bug in the update phase would result in thousands of servers being unable to function and a big problem for support and system administrators. We were able to rid the program of all security pain points thereby making the updating and installation process powerful and robust.
Modern Windows systems come with GPT instead of MBR. MBR allowed us to address only 2TB of disk size. Replibit’s agent was coded in C++ and was able to receive backups of any size but the virtualization technique used by Replibit was unprepared to boot disks larger than 2TB. On backup file mounting, fuse would execute code creating the MBR needed by the virtualize system in order to boot.
Since there was no code for GPT, booting these backups was impossible. This type of feature is extremely difficult to test, since it requires booting a Virtual Machine many times and reviewing byte-by-byte code with absolutely no documentation or help from any other party.
We also helped Reblibit with the following:
- Continued development of a FUSE File System for BDR (Backup Disaster Recovery) that automatically transforms non-bootable device snapshots in exact drive images from the original machine (C++).
- Worked on the frontend and the backend of the application based on Flask, Twisted, Nginx and Python.
- Provided highest level of support to customers assessing data loss and data recovery options.
- Code reviewed the system, while security assessing the application to solve any security issues.
- Overview ZFS issues, data loss, and data migration.
- Created a UEFI Agent Service to allow virtualization on new systems.
- Amazon Web Services
When Replibit reached out to us, they had no development process and the source code was not secure. There was no code repository, no way of tracking down changes and nobody to ask.
By the end of the project, Replibit had a code repository, correctly tagged with releases and a sturdy development process designed to match their needs regarding new features and bug fixing. Their update process had become robust and functional, and many critical features and fixes were in place, allowing them to be acquired by eFolder and continue on today as a trusted solution for system administrators.