A performance analysis of MapReduce applications on big data in cloud based Hadoop

MapReduce is one of the most popular programming model for big data analysis in Distributed and Parallel Computing Environment. It is used for implementing parallel applications. With the growing development of mobile Internet and cloud computing, the issues related to big data have been a matter of concern in both industry and academy. There are several platforms for users to develop their applications based on MapReduce framework such as Hadoop. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. This paper discusses various MapReduce applications like Wordcount, Pi, TeraSort, Grep in Cloud based Hadoop. We have shown experimental results of these applications on Amazon EC2 using two types of Ubuntu instances. In this paper, performance of above application has been shown with respect to execution time and number of nodes. We find in our research study that as the number of nodes increases the execution time decreases and performance increases.

[1]  M. Maurya,et al.  Performance analysis of MapReduce programs on Hadoop cluster , 2012, 2012 World Congress on Information and Communication Technologies.

[2]  Yi Pan,et al.  H2T: A Simple Hadoop-to-Twister Translator for Cloud Computing , 2013, 2013 International Symposium on Biometrics and Security Technologies.

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Keqiu Li,et al.  Big Data Processing in Cloud Computing Environments , 2012, 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks.

[5]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[6]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).