Study on Efficient and Adaptive Reproducing Management in Hadoop Distributed File System

The quantity of utilizations in view of Apache Hadoop is drastically expanding because of the robust elements of this framework. The Hadoop Distributed File System (HDFS) gives the unwavering quality and accessibility for calculation on applying static replication as a matter of course. Nonetheless, in perspective of the attributes of parallel operations on the application layer, the final result is absolutely unique for every information document in HDFS. Therefore, keeping up a similar replication instrument for each information record prompts to inconvenient consequences for the execution. By considering completely about the demerits of the HDFS replication, this paper initiated a methodology to deal with progressively obtaining the information document based on the predictive examination. With the assistance of likelihood hypothesis, the use of every information record can be anticipated to make a comparing replication procedure. In the end, the prevalent records can be thus reproduced by their own particular possibilities or by the low potential records, and an eradication code is connected to keep up the fixed quality. Thus, our approach all the while enhances the accessibility while keeping the dependability in correlation with the default method. Besides, the unpredictable decrease is connected to upgrade the viability of the expectation when managing big data.

[1]  Frédérique E. Oggier,et al.  Redundantly grouped cross-object coding for repairable storage , 2012, APSys.

[2]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[3]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[4]  Cristina L. Abad,et al.  DARE: Adaptive Data Replication for Efficient Cluster Scheduling , 2011, 2011 IEEE International Conference on Cluster Computing.

[5]  Minghua Chen,et al.  Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[6]  Geoffrey C. Fox,et al.  Investigation of Data Locality in MapReduce , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[7]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[8]  Stephen B. Wicker,et al.  Reed-Solomon Codes and Their Applications , 1999 .

[9]  Ning Zhang,et al.  ERMS: An Elastic Replication Management System for HDFS , 2012, 2012 IEEE International Conference on Cluster Computing Workshops.

[10]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..