CSTORE: A desktop-oriented distributed public cloud storage system

Abstract Previous distributed file systems aim at storing very large data sets. Their architectures are often designed to support large-scale data-intensive applications, which cannot cope with massive daily users who want to store their data on the Internet. In this paper, CSTORE is proposed to support mass data storage for a large number of users. The user-independent metadata management can ensure data security through assigning an independent namespace to every user. Operating logs are applied to synchronize simultaneous sessions of the same user and resolve conflicts. We also implement a block-level deduplication strategy based on our three-level mapping hash method for the large quantity of repeated data. The migration and rank extension on the hash rules are defined to achieve load balancing and capacity expansion. Performance measurements under a variety of workloads show that CSTORE offers the better scalability and performance than other public cloud storage systems.

[1]  Gregory R. Ganger,et al.  Object-based storage , 2003, IEEE Commun. Mag..

[2]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[3]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[4]  Marvin Theimer,et al.  Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.

[5]  Josef Spillner,et al.  Information Dispersion over Redundant Arrays of Optimal Cloud Storage for Desktop Users , 2011, 2011 Fourth IEEE International Conference on Utility and Cloud Computing.

[6]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[7]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[8]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[9]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[11]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[12]  Kevin Lee,et al.  Data Consistency Properties and the Trade-offs in Commercial Cloud Storage: the Consumers' Perspective , 2011, CIDR.

[13]  L. Vivier,et al.  The new ext 4 filesystem : current status and future plans , 2007 .

[14]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[15]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[16]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18]  Randy H. Katz,et al.  Introduction to redundant arrays of inexpensive disks (RAID) , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[19]  GhemawatSanjay,et al.  The Google file system , 2003 .

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.