A Task Scheduling Algorithm for Hadoop Platform

MapReduce is a kind of software framework for easily writing applications which process vast amounts of data on large clusters of commodity hardware. In order to get better allocation of tasks and load balancing, the MapReduce work mode and task scheduling algorithm of Hadoop platform is analyzed in this paper. According to this situation that the number of tasks of the smaller weight job is more, while that of the larger weight job is less, this paper introduces the idea of weighted round-robin scheduling algorithm into the task scheduling of Hadoop and puts forward the weight update rules through analyzing all the situations of weight update. Experimental result indicates that it is effective in making task allocation and achieving good balance when it is applied into the Hadoop platform which uses only JobTracker scheduling.

[1]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[2]  Peter G. Harrison,et al.  Performance of a Priority-Weighted Round Robin Mechanism for Differentiated Service Networks , 2007, 2007 16th International Conference on Computer Communications and Networks.

[3]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[4]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[5]  Luis Rodero-Merino,et al.  A break in the clouds: towards a cloud definition , 2008, CCRV.

[6]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[7]  Jeffrey Dean,et al.  Keynote talk: Experiences with MapReduce, an abstraction for large-scale computation , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[11]  Jianhua Gu,et al.  A New Resource Scheduling Strategy Based on Genetic Algorithm in Cloud Computing Environment , 2012, J. Comput..

[12]  Tian Qi Degin and Implementation Priority Based Weighted Fair Queue of Based on MapReduce Cluster , 2011 .

[13]  Ning Wang,et al.  A Cloud Computing Infrastructure on Heterogeneous Computing Resources , 2011, J. Comput..