Performance evaluation of fair and capacity scheduling in Hadoop YARN

Big Data research can be divided broadly into the scheduling of jobs and controlling the rate at which jobs are generating and running. Hadoop YARN provides better resource management schemes to schedule jobs by having a focus on the reduction of total time required to complete the jobs. This paper provides a study of scheduling algorithms in Hadoop YARN and evaluates the performance of two scheduling algorithm, fair scheduling and capacity scheduling using Yarn Scheduler Load Simulator (SLS). The result of this evaluation can be used further to enhance the capabilities of scheduling algorithm in different type of data sets.

[1]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[2]  L. S. S. Reddy,et al.  Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments , 2012, ArXiv.

[3]  Thomas H. Davenport,et al.  Big Data at Work: Dispelling the Myths, Uncovering the Opportunities , 2014 .

[4]  T. Davenport big data @ work , 2014 .

[5]  N. P. Gopalan,et al.  An Optimal Task Selection Scheme for Hadoop Scheduling , 2014 .

[6]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[7]  M. Vijayalakshmi,et al.  Big Data analytics frameworks , 2014, 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA).

[8]  Marlena J. Gaul Big Data at Work: Dispelling the Myths, Uncovering the Opportunities , 2014 .

[9]  Ebin Deni Raj,et al.  A scalable cloud computing deployment framework for efficient MapReduce operations using Apache YARN , 2014, International Conference on Information Communication and Embedded Systems (ICICES2014).

[10]  Winfried K. Grassmann,et al.  Simulation and Performance Evaluation of Hadoop Capacity Scheduler , 2013 .

[11]  Vinod Kumar Vavilapalli,et al.  Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 , 2014 .

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Dieter Gawlick,et al.  Situation aware computing for big data , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[14]  R. Sandeep Raj,et al.  An approach for optimization of resource management in Hadoop , 2014, International Conference on Computing and Communication Technologies.

[15]  Cees T. A. M. de Laat,et al.  Defining architecture components of the Big Data Ecosystem , 2014, 2014 International Conference on Collaboration Technologies and Systems (CTS).