ThriftStore: Finessing Reliability Trade-Offs in Replicated Storage Systems

This paper explores the feasibility of a storage architecture that offers the reliability and access performance characteristics of a high-end system, yet is cost-efficient. We propose ThriftStore, a storage architecture that integrates two types of components: volatile, aggregated storage and dedicated, yet low-bandwidth durable storage. On the one hand, the durable storage forms a back end that enables the system to restore the data the volatile nodes may lose. On the other hand, the volatile nodes provide a high-throughput front-end. Although integrating these components has the potential to offer a unique combination of high throughput and durability at a low cost, a number of concerns need to be addressed to architect and correctly provision the system. To this end, we develop analytical and simulation-based tools to evaluate the impact of system characteristics (e.g., bandwidth limitations on the durable and the volatile nodes) and design choices (e.g., the replica placement scheme) on data availability and the associated system costs (e.g., maintenance traffic). Moreover, to demonstrate the high-throughput properties of the proposed architecture, we prototype a GridFTP server based on ThriftStore. Our evaluation demonstrates an impressive, up to 800 Mbps transfer throughput for the new GridFTP service.

[1]  Stephen L. Scott,et al.  FreeLoader: Scavenging Desktop Storage Resources for Scientific Data , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[2]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[3]  William I. Nowicki,et al.  NFS: Network File System Protocol specification , 1989, RFC.

[4]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[5]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[6]  Bianca Schroeder,et al.  Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? , 2007, TOS.

[7]  Joseph Pasquale,et al.  Analysis of Long-Running Replicated Systems , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[8]  Steven Tuecke,et al.  GridFTP: Protocol Extensions to FTP for the Grid , 2001 .

[9]  Wei Chen,et al.  On the Impact of Replica Placement to the Reliability of Distributed Brick Storage Systems , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[10]  Patrick Fuhrmann dCache, the Commodity Cache , 2004, MSST.

[11]  Yong Zhao,et al.  Many-task computing for grids and supercomputers , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.

[12]  R. A. Coyne,et al.  The high performance storage system , 1993, Supercomputing '93.

[13]  John Kubiatowicz,et al.  Design and evaluation of distributed wide-area on-line archival storage systems , 2006 .

[14]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[15]  Ricardo Bianchini,et al.  Exploiting redundancy to conserve energy in storage systems , 2006, SIGMETRICS '06/Performance '06.

[16]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[17]  Robert Morris,et al.  A distributed hash table , 2006 .

[18]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[19]  Zhao Zhang,et al.  Design and evaluation of a collective IO model for loosely coupled petascale programming , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.

[20]  Matei Ripeanu,et al.  Amazon S3 for science grids: a viable solution? , 2008, DADC '08.

[21]  GhemawatSanjay,et al.  The Google file system , 2003 .

[22]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[23]  Bianca Schroeder,et al.  The Computer Failure Data Repository (CFDR): collecting, sharing and analyzing failure data , 2006, SC.

[24]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[25]  Rodrigo Rodrigues,et al.  Proceedings of Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems Hotos Ix: the 9th Workshop on Hot Topics in Operating Systems High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two , 2022 .

[26]  Michael J. Feeley,et al.  Separating durability and availability in self-managed storage , 2004, EW 11.

[27]  Eric R. Ziegel,et al.  System Reliability Theory: Models, Statistical Methods, and Applications , 2004, Technometrics.

[28]  Nenghai Yu,et al.  Distributed Hash Table , 2013, SpringerBriefs in Computer Science.

[29]  Tevfik Kosar Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management , 2012 .

[30]  Scott A. Brandt,et al.  Efficient metadata management in large distributed storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[31]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[32]  Douglas Thain,et al.  Towards Data Intensive Many-Task Computing , 2012 .

[33]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[34]  Marvin Theimer,et al.  Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.

[35]  Matei Ripeanu,et al.  The case for a versatile storage system , 2010, OPSR.

[36]  Richard Wolski,et al.  Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.

[37]  Thu D. Nguyen,et al.  The shape of failure , 2001 .