Evaluation and Analysis of Capacity Scheduler and Fair Scheduler in Hadoop Framework on Big Data Technology

Apache Hadoop is an open source framework that implements MapReduce. It is scalable, reliable, and fault tolerant. Scheduling is an important process in Hadoop MapReduce. It is because scheduling has responsibility to allocate resources for running applications based on resource capacity, queues, running tasks, and the number of users. Changing single node to multi node Hadoop cluster can optimize HDFS, but quite costly. Scheduler performs the function of scheduling based on resource requirements, such as memory, CPU, disk, and network. The most general purpose of scheduling algorithm is minimizing the time of completing a task. Hadoop Scheduling is an independent module where users are able to design their own scheduler based on the application's actual need. So it can fulfill the specific need of the business in accordance with the desired result. This research will analyze the characteristic of Capacity Scheduler and Fair Scheduler.

[1]  G. Jisha,et al.  Resource aware scheduler for heterogeneous workload based on estimated task processing time , 2015, 2015 International Conference on Control Communication & Computing India (ICCC).

[2]  Vineet Kumar Singh,et al.  Analyzing BigData with Hadoop cluster in HDInsight azure Cloud , 2015, 2015 Annual IEEE India Conference (INDICON).

[3]  P. Asha,et al.  Hybrid scheduler to overcome the negative impact of job preemption for heterogeneous Hadoop systems , 2016, 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT).

[4]  Frank J. Ohlhorst Big Data Analytics: Turning Big Data into Big Money , 2012 .

[5]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[6]  Bu-Sung Lee,et al.  DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters , 2014, IEEE Transactions on Cloud Computing.

[7]  Kritwara Rattanaopas,et al.  Improving Hadoop MapReduce performance with data compression: A study using wordcount job , 2017, 2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON).

[8]  Albert Y. Zomaya,et al.  Heterogeneous Job Allocation Scheduler for Hadoop MapReduce Using Dynamic Grouping Integrated Neighboring Search , 2020, IEEE Transactions on Cloud Computing.

[9]  Shicong Meng,et al.  Bigprovision: a provisioning framework for big data analytics , 2015, IEEE Network.