Lone Star Stack: Architecture of a Disk-Based Archival System

The need for huge storage systems rises with the ever growing creation of data. With growing capacities and shrinking prices, "write once read sometimes" workloads become more common. New data is constantly added, rarely updated or deleted, and every stored byte might be read at any time - a common pattern for digital archives or big data scenarios. We present the LoneStar Stack, a disk based archival storage system building block that is optimized for high reliability and energy efficiency. It provides a POSIX file system interface that uses flash based storage for write-offloading and metadata and the disk-based LoneStar RAID for user data storage. The RAID attempts to spin down disks as soon and as long as possible. For reads, only a single disk is accessed, while writes require 3 additional parity disks to be spun up. The cache aggregates new files and a semantic data placement engine decides how they are persisted to the RAID. Asynchronous data movers then persist the data. The system provides an end-to-end data integrity, an elastic fault tolerance that can at least recover from all 3-disk failures, and provides multiple paths for data integrity checking and recovery. The system can use 70% of the raw disk capacity and is optimized for fast reads with a minimum number of powered on disk drives.

[1]  André Brinkmann,et al.  ESB: Ext2 Split Block Device , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[2]  Ethan L. Miller,et al.  Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage , 2008, FAST.

[3]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[4]  John B. Carter,et al.  Reliability-aware energy management for hybrid storage systems , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[5]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[6]  Scott A. Brandt,et al.  NVCache: Increasing the Effectiveness of Disk Spin-Down Algorithms with Caching , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[7]  Ethan L. Miller,et al.  Semantic data placement for power management in archival storage , 2010, 2010 5th Petascale Data Storage Workshop (PDSW '10).

[8]  Scott Kirkpatrick,et al.  Architecture of the internet archive , 2009, SYSTOR '09.

[9]  Andrew S. Tanenbaum,et al.  Integrating flash-based SSDs into the storage stack , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Dong Li,et al.  eRAID: Conserving Energy in Conventional Disk-Based RAID System , 2008, IEEE Transactions on Computers.

[11]  Himabindu Pucha,et al.  Cost Effective Storage using Extent Based Dynamic Tiering , 2011, FAST.

[12]  Andrea C. Arpaci-Dusseau,et al.  End-to-end Data Integrity for File Systems: A ZFS Case Study , 2010, FAST.

[13]  Jin-Soo Kim,et al.  BEST: Best-effort energy saving techniques for NAND flash-based hybrid storage , 2012, IEEE Transactions on Consumer Electronics.

[14]  Ashish Gehani,et al.  Performance and extension of user space file systems , 2010, SAC '10.

[15]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[16]  André Brinkmann,et al.  Lonestar: An Energy-Aware Disk Based Long-Term Archival Storage System , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[17]  Dirk Grunwald,et al.  Massive Arrays of Idle Disks For Storage Archives , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[18]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[19]  Darrell D. E. Long,et al.  A Spin-Up Saved Is Energy Earned: Achieving Power-Efficient, Erasure-Coded Storage , 2008, HotDep.

[20]  Adam Leventhal,et al.  Triple-Parity RAID and Beyond , 2009, ACM Queue.

[21]  Samuel Neves,et al.  BLAKE2: Simpler, Smaller, Fast as MD5 , 2013, ACNS.

[22]  Stratis Viglas,et al.  Flashing up the storage layer , 2008, Proc. VLDB Endow..

[23]  André Brinkmann,et al.  Evaluation of Applied Intra-disk Redundancy Schemes to Improve Single Disk Reliability , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[24]  Jin Qian,et al.  PARAID: A gear-shifting power-aware RAID , 2007, TOS.

[25]  Dong Li,et al.  EERAID: energy efficient redundant and inexpensive disk array , 2004, EW 11.

[26]  Li Xiao,et al.  Semi-RAID: A reliable energy-aware RAID data layout for sequential data access , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).