Mapreduce Performance in Heterogeneous Environments : A Review

Mapreduce has become an important distributed processing model for large-scale data-intensive application like data mining and web indexing. Hadoop, an open-source implementation of Mapreduce, is widely used for short jobs requiring low response time. Mapreduce and Hadoop do not fundamentally consider heterogeneity of node and workload running in computer clusters. The current Hadoop implementation assumes that computing nodes in the cluster are homogeneous in nature. In this article, we survey some of the approaches that have been designed to improve the Mapreduce performance in heterogeneous environments. Index Terms — Mapreduce, Cloud computing, Heterogeneous Environments, Hadoop, Distributed Computing, Data Locality, Fault Tolerance. ——————————  ——————————

[1]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[2]  Quan Chen,et al.  SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[3]  C. J. Carter,et al.  OXYGEN ABSORPTION IN THE EARTH'S ATMOSPHERE. , 1968 .

[4]  Torben Bach Pedersen,et al.  Integrating Data Warehouses with Web Data: A Survey , 2008, IEEE Transactions on Knowledge and Data Engineering.

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Geoffrey C. Fox,et al.  Investigation of Data Locality in MapReduce , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[7]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  Yuhong Feng,et al.  An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments , 2011, 2011 International Conference on Cloud and Service Computing.

[9]  Hiroyuki Goto,et al.  Efficient Scheduling Focusing on the Duality of MPL Representation , 2007, 2007 IEEE Symposium on Computational Intelligence in Scheduling.

[10]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[11]  Shengzhong Feng,et al.  Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.

[12]  M. Cugmas,et al.  On comparing partitions , 2015 .

[13]  Geoffrey C. Fox,et al.  Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).