ENIGMA: Distributed Virtual Disks for Cloud Computing

We propose ENIGMA, a distributed infrastructure that provides Cloud Computing infrastructures with emph{virtual disks} by abstracting the storage resources provided by a set of physical nodes and exposing to Cloud Computing users, applications, and Virtual Machines a set of virtual block storage devices, that can be used exactly as standard physical disks. ENIGMA is designed to provide large storage capacity, high availability, strong confidentiality, and data access performance comparable to that of traditional storage virtualization solutions. To achieve all these design goals, ENIGMA exploits erasure-coding techniques, whereby each sector of a virtual disk is encoded as a set of $n$ emph{fragments}, that are independently stored on a set of physical storage nodes, $k$ of which ($k leq n$) are sufficient to reconstruct that sector. We present the ENIGMA architecture and we show how the coding of sectors of a virtual disk ensures high availability in spite of failure of individual storage nodes as well as confidentiality in face of several types of attacks. We also briefly discuss performance results of ENIGMA.

[1]  Vijay S. Pande,et al.  Storage@home: Petascale Distributed Storage , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[2]  Michael Luby,et al.  LT codes , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[3]  Arif Merchant,et al.  FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.

[4]  Robert L. Grossman,et al.  Sector: A high performance wide area community data storage and sharing system , 2010, Future Gener. Comput. Syst..

[5]  GhemawatSanjay,et al.  The Google file system , 2003 .

[6]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[7]  Sujay Sanghavi Intermediate Performance of Rateless Codes , 2007, 2007 IEEE Information Theory Workshop.

[8]  Qi Zhang,et al.  Characterization of storage workload traces from production Windows Servers , 2008, 2008 IEEE International Symposium on Workload Characterization.

[9]  Saejoon Kim,et al.  Improved intermediate performance of rateless codes , 2009, 2009 11th International Conference on Advanced Communication Technology.

[10]  Craig A. N. Soules,et al.  Survivable storage systems , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[11]  H. Howie Huang,et al.  Analyzing the feasibility of building a new mass storage system on distributed resources , 2008, Concurr. Comput. Pract. Exp..

[12]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[13]  Amit A. Levy,et al.  Comet: An active distributed key-value store , 2010, OSDI.

[14]  Rajkumar Buyya,et al.  InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services , 2010, ICA3PP.

[15]  Marlon E. Pierce,et al.  Supporting cloud computing with the virtual block store system , 2009, 2009 5th IEEE International Conference on E-Science Workshops.

[16]  Andrew Warfield,et al.  Parallax: Managing Storage for a Million Machines , 2005, HotOS.

[17]  Mohammad Reza Ahmadi,et al.  Effect of Virtual Techniques in Data Storage Access , 2010, 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops.

[18]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[19]  Emin Gün Sirer,et al.  Meridian: a lightweight network location service without virtual coordinates , 2005, SIGCOMM '05.

[20]  Matteo Sereno,et al.  On the fly gaussian elimination for LT codes , 2009, IEEE Communications Letters.

[21]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[22]  Srinath T. V. Setty,et al.  Depot: Cloud Storage with Minimal Trust , 2010, TOCS.

[23]  Bobby Bhattacharjee,et al.  Scalable application layer multicast , 2002, SIGCOMM '02.

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .