End-to-end disaster recovery planning: From art to science

We present the design and implementation of ENDEAVOUR - a framework for integrated end-to-end disaster recovery (DR) planning. Unlike existing research that provides DR planning within a single layer of the IT stack (e.g. storage controller based replication), ENDEAVOUR can choose technologies and composition of technologies across multiple layers like virtual machines, databases and storage controllers. ENDEAVOUR uses a canonical model of available replication technologies at all layers, explores strategies to compose them, and performs a novel map-search-reduce heuristic to identify the best DR plans for given administrator requirements. We present a detailed analysis of ENDEAVOUR including empirical characterization of various DR technologies, their composition, and a end-to-end case study.

[1]  William H. Sanders,et al.  Designing dependable storage solutions for shared application environments , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[2]  Kimberly Keeton,et al.  Challenges in managing dependable data systems , 2006, PERV.

[3]  Eric Anderson,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .

[4]  Kimberly Keeton,et al.  A framework for evaluating storage system dependability , 2004, International Conference on Dependable Systems and Networks, 2004.

[5]  Arkady Kanevsky,et al.  Are disks the dominant contributor for storage failures?: A comprehensive study of storage subsystem failure characteristics , 2008, TOS.

[6]  Akshat Verma,et al.  SWEEPER: An Efficient Disaster Recovery Point Identification Mechanism , 2008, FAST.

[7]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[8]  Dirk Beyer,et al.  Designing for Disasters , 2004, FAST.

[9]  John Wilkes,et al.  Seneca: remote mirroring done write , 2003, USENIX Annual Technical Conference, General Track.

[10]  Arif Merchant,et al.  Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.

[11]  Alain Azagury Point-in-Time Copy: Yesterday, Today and Tomorrow , 2002 .

[12]  Ibm Redbooks IBM System Storage San Volume Controller , 2006 .

[13]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[14]  Andrea C. Arpaci-Dusseau,et al.  An analysis of data corruption in the storage stack , 2008, TOS.

[15]  Dirk Beyer,et al.  On the road to recovery: restoring data after disasters , 2006, EuroSys '06.

[16]  Marcos K. Aguilera,et al.  Improving Recoverability in Multi-tier Storage Systems , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[17]  Andrea C. Arpaci-Dusseau,et al.  Parity Lost and Parity Regained , 2008, FAST.