Robot: An efficient model for big data storage systems based on erasure coding

It is well-known that with the explosive growth of data, the age of big data has arrived. How to save huge amounts of data is of great importance to both industry and academia. This paper puts forward a solution based on coding technologies in big data system that store a lot of cold data. By studying existing coding technologies and big data systems, we can not only maintain the system's reliability, but also improve the security and the utilization of storage systems. Due to the remarkable reliability and space saving rate of coding technologies, importing coding schema in to big data systems becomes prerequisite. In our presented schema, the storage node is divided into several virtual nodes to keep load balancing. By setting up different virtual node storage groups for different codec server, we can ensure system availability. And by utilizing the parallel decoding computing of the node and the block of data, we can also reduce the system recovery time when data is corrupted. Additionally, different users set different coding parameters can improve the robustness of big data storage systems. We configure various data block m and calibration block k to improve the utilization rate in the quantitative experiments. The results shows that parallel decoding speed can rise up two times than the past serial decoding speed. The encoding efficiency with ICRS coding is 34.2% higher than using CRS and 56.5% more than using RS coding equally. The decoding rate by using ICRS is 18.1% higher than using CRS and 31.1% higher than using RS averagely.

[1]  Peter Desnoyers,et al.  Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.

[2]  Darrell D. E. Long,et al.  Horus: fine-grained encryption-based security for large-scale storage , 2013, FAST.

[3]  Robert H. Deng,et al.  New efficient MDS array codes for RAID. Part I. Reed-Solomon-like codes for tolerating three disk failures , 2005, IEEE Transactions on Computers.

[4]  Jiguang Wan,et al.  A new high-performance, energy-efficient replication storage system with reliability guarantee , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[5]  Changsheng Xie,et al.  A Quantitative Evaluation Model for Choosing Efficient Redundancy Strategies over Clouds , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[6]  Changsheng Xie,et al.  Reducing Storage Overhead with Small Write Bottleneck Avoiding in Cloud RAID System , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[7]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  GhemawatSanjay,et al.  The Google file system , 2003 .

[9]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[10]  James S. Plank,et al.  A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..

[11]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[12]  James Lee Hafner,et al.  HoVer Erasure Codes For Disk Arrays , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[13]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[14]  Yang Wang,et al.  Robustness in the Salus Scalable Block Store , 2013, NSDI.

[15]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[16]  Robert J. T. Morris,et al.  The evolution of storage systems , 2003, IBM Syst. J..

[17]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[18]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[19]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.