论文信息 - Using Pattern Classification for Task Assignment in MapReduce

Using Pattern Classification for Task Assignment in MapReduce

MapReduce has become a popular paradigm for large scale data processing in the cloud. The sheer scale of MapReduce deployments make task assignment in MapReduce an interesting problem. The scale of MapReduce applications presents unique opportunity to use data driven algorithms in resource management. We present a learning based scheduler that uses pattern classification for utilization oriented task assignment in MapReduce. We also present the application of our algorithm to the Hadoop platform. The scheduler assigns tasks by classifying them in two classes, good and bad. From the tasks labeled as good it selects a task that is least likely to overload a worker node. We allow users to plug in their own policy schemes for prioritizing jobs. The scheduler learns the impact of different applications on utilization rather quickly and achieves a user specified level of utilization. Our results show that our scheduler reduces response times of jobs in certain cases by a factor of two.

Vasudeva Varma | Jaideep Dhok

[1] Michael Mitzenmacher,et al. How Useful Is Old Information? , 2000, IEEE Trans. Parallel Distributed Syst..

[2] Alberto José Proença,et al. Scheduling Under Conditions of Uncertainty: A Bayesian Approach , 2004, Euro-Par.

[3] Luo Zhao-hui,et al. Grid scheduling optimization under conditions of uncertainty , 2007 .

[4] Amy W. Apon,et al. A learning approach to processor allocation in parallel systems , 1999, CIKM '99.

[5] Michal Cutler,et al. The portrait of a common HTML web page , 2006, DocEng '06.

[6] Thomas Kunz,et al. The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme , 1991, IEEE Trans. Software Eng..

[7] Matei Zaharia,et al. Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[8] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[9] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .

[10] Thomas H. Kunz,et al. The Learning Behaviour of a Scheduler using a Stochastic Learning Automaton , 1991 .

[11] Alberto José Proença,et al. A Bayesian runtime load manager on a shared cluster , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[12] Harry Zhang,et al. The Optimality of Naive Bayes , 2004, FLAIRS.

[13] A. Negi,et al. Applying Machine Learning Techniques to Improve Linux Process Scheduling , 2005, TENCON 2005 - 2005 IEEE Region 10 Conference.