Tornado Codes for MAID Archival Storage

This paper examines the application of Tornado codes, a class of low density parity check (LDPC) erasure codes, to archival storage systems based on massive arrays of idle disks (MAID). We present a log- structured extent-based archival file system based on Tornado Coded stripe storage. The file system is combined with a MAID simulator to emulate the behavior of a large-scale storage system with the goal of employing Tornado Codes to provide fault tolerance and performance in a power-constrained environment. The effect of power conservation constraints on system throughput is examined, and a policy of placing multiple data nodes on a single device is shown to increase read throughput at the cost of a measurable, but negligible, decrease in fault tolerance. Finally, a system prototype is implemented on a 100 TB Lustre storage cluster, providing GridFTP accessible storage with higher reliability and availability than the underlying storage architecture.

[1]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[2]  Reagan Moore,et al.  A simple mass storage system for the SRB data grid , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[3]  James S. Plank,et al.  Small parity-check erasure codes - exploration and observations , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[4]  Henry M. Tufo,et al.  Fault Tolerance of Tornado Codes for Archival Storage , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[5]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[6]  Michael Mitzenmacher,et al.  Accessing multiple mirror sites in parallel: using Tornado codes to speed up downloads , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[7]  Andrew A. Chien,et al.  RobuSTore: Robust Performance for Distributed Storage Systems , 2006 .

[8]  Joseph JáJá,et al.  Mitigating risk of data loss in preservation environments , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[9]  Dirk Grunwald,et al.  Massive Arrays of Idle Disks For Storage Archives , 2002, ACM/IEEE SC 2002 Conference (SC'02).