Improving Mapreduce for Incremental Processing Using Map Data Storage

Abstract In this paper, we propose methods for the improvement of performance of a MapReduce program when it is used for incremental processing. Incremental processing is generally used where data is refreshed periodically to reflect small changes to the input dataset. To reduce the delay in re-computing unchanged data, we introduce methods that selectively compute only data that has been altered. It incorporates the concept of Bloom Filter. Bloom filter is a space-efficient data structure, that can with a certain probability check if the data is modified or not. Traditional systems process the entire data when even a small percentage or none of data is changed. This is time-consuming as well as consumes a huge number of CPU clock cycles additionally to process data that has not been changed. In order to reduce the wastage of CPU clock cycles, a system is proposed wherein a method of execution using Bloom Filter helps improve the performance of the system up to 17% when compared to existing system.

[1]  Sushant S. Khopkar,et al.  An Efficient Map-Reduce Algorithm for the Incremental Computation of All-Pairs Shortest Paths in Social Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[2]  Mayank Bhushan,et al.  Big data query optimization by using Locality Sensitive Bloom Filter , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[3]  Ge Yu,et al.  i2MapReduce: Incremental mapreduce for mining evolving big data , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Yunhao Liu,et al.  False Negative Problem of Counting Bloom Filter , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Gautam Shroff,et al.  Approximate Incremental Big-Data Harmonization , 2013, 2013 IEEE International Congress on Big Data.

[7]  Jun Zhao,et al.  Parallelized incremental support vector machines based on MapReduce and Bagging technique , 2012, 2012 IEEE International Conference on Information Science and Technology.

[8]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[9]  Gabriel Antoniu,et al.  To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload , 2014, 2014 5th International Workshop on Data-Intensive Computing in the Clouds.