Performance enhancement of Hadoop MapReduce framework for analyzing BigData

In this BigData era processing and analyzing the data is very important and tedious job. An open source framework called Hadoop, implementation of MapReduce provides efficient platform for BigData analytics. The performance of Hadoop MapReduce mainly depends on its configuration parameters. Tuning the job configuration parameters is an effective way to improve performance so that we can reduce the execution time and the disk utilization. The performance tuning mainly based on CPU usage, disk I/O rate, memory usage, network traffic components. In this paper we are discussing the tuning methods to enhance the performance of MapReduce jobs. From our experiment we can say that performance has improved by 32.97% over the baseline system.

[1]  Vladimir Vlassov,et al.  MapReduce: Limitations, Optimizations and Open Issues , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[2]  J. Alberto Espinosa,et al.  Big Data: Issues and Challenges Moving Forward , 2013, 2013 46th Hawaii International Conference on System Sciences.

[3]  Roman Trobec,et al.  Multicluster Hadoop Distributed File System , 2012, 2012 Proceedings of the 35th International Convention MIPRO.

[4]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[5]  Wei Jiang,et al.  MATE-CG: A Map Reduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[6]  Haifeng Chen,et al.  Autotuning Configurations in Distributed Systems for Performance Improvements Using Evolutionary Strategies , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[7]  Aniruddha S. Gokhale,et al.  A self-tuning system based on application Profiling and Performance Analysis for optimizing Hadoop MapReduce cluster configuration , 2013, 20th Annual International Conference on High Performance Computing.

[8]  Optimizing Hadoop * Deployments , 2010 .