SEARS: Space efficient and reliable storage system in the cloud

Today's cloud storage services must offer storage reliability and fast data retrieval for large amount of data without sacrificing storage cost. We present SEARS, a cloud-based storage system which integrates erasure coding and data deduplication to support efficient and reliable data storage with fast user response time. With proper association of data to storage server clusters, SEARS provides flexible mixing of different configurations, suitable for real-time and archival applications. Our prototype implementation of SEARS over Amazon EC2 shows that it outperforms existing storage systems in storage efficiency and file retrieval time. For 3 MB files, SEARS delivers retrieval time of 2.5 s compared to 7 s with existing systems.

[1]  Pekka Aavikko,et al.  Network Time Protocol , 2010 .

[2]  Muriel Médard,et al.  Toward sustainable networking: Storage area networks with network coding , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[3]  Emina Soljanin,et al.  On the Delay-Storage Trade-Off in Content Download from Coded Distributed Storage Systems , 2013, IEEE Journal on Selected Areas in Communications.

[4]  Kannan Ramchandran,et al.  Codes can reduce queueing delay in data centers , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[5]  GhemawatSanjay,et al.  The Google file system , 2003 .

[6]  David Hung-Chang Du,et al.  Frequency Based Chunking for Data De-Duplication , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[7]  Mark Lillibridge,et al.  Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.

[8]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[9]  Bin Yan,et al.  R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems , 2009, ICS '09.

[10]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[11]  Anne-Marie Kermarrec,et al.  Probabilistic deduplication for cluster-based storage systems , 2012, SoCC '12.

[12]  Kai Li,et al.  Tradeoffs in Scalable Data Routing for Deduplication Clusters , 2011, FAST.