Performance Analysis of Schedulers to Handle Multi Jobs in Hadoop Cluster

MapReduce is programming model to process the large set of data. Apache Hadoop an implementation of MapReduce has been developed to process the Big Data. Hadoop Cluster sharing introduces few challenges such as scheduling the jobs, processing data locality, efficient resource usage, fair usage of resources, fault tolerance. Accordingly, we focused on a job scheduling system in Hadoop in order to achieve efficiency. Schedulers are responsible for doing task assignment. When a user submits a job, it will move to a job queue. From the job queue, the job will be divided into tasks and distributed to different nodes. By the proper assignment of tasks, job completion time will reduce. This can ensure better performance of the jobs. By default, Hadoop uses the FIFO scheduler. In our experiment, we are discussing and comparing FIFO scheduler with Fair scheduler and Capacity scheduler job execution time.

[1]  Jun Li,et al.  TDWS: A Job Scheduling Algorithm Based on MapReduce , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[2]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[3]  Aniruddha S. Gokhale,et al.  A self-tuning system based on application Profiling and Performance Analysis for optimizing Hadoop MapReduce cluster configuration , 2013, 20th Annual International Conference on High Performance Computing.

[4]  Xu Liu,et al.  Evaluating task scheduling in hadoop-based cloud systems , 2013, 2013 IEEE International Conference on Big Data.

[5]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[6]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[7]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[8]  Lei Shi S3: An Efficient Shared-Scan Scheduler on MapReduce Framework , 2011 .

[9]  Jorge-Arnulfo Quiané-Ruiz,et al.  Efficient Big Data Processing in Hadoop MapReduce , 2012, Proc. VLDB Endow..

[10]  Vladimir Vlassov,et al.  MapReduce: Limitations, Optimizations and Open Issues , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[11]  Yi Yao,et al.  Scheduling heterogeneous MapReduce jobs for efficiency improvement in enterprise clusters , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[12]  Aditya B. Patel,et al.  Addressing big data problem using Hadoop and Map Reduce , 2012, 2012 Nirma University International Conference on Engineering (NUiCONE).

[13]  Lei Shi,et al.  S3: An Efficient Shared Scan Scheduler on MapReduce Framework , 2011, 2011 International Conference on Parallel Processing.

[14]  Shyam Deshmukh,et al.  Job Classification for MapReduce Scheduler in Heterogeneous Environment , 2013, 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies.

[15]  Kyuseok Shim,et al.  MapReduce Algorithms for Big Data Analysis , 2013, DNIS.