Performance Analysis of Scheduling Algorithms in Apache Hadoop

Applications involving big data need enormous memory space to load the data and high processing power to execute them. Individually, the traditional computing systems are not sufficient to execute these big data applications, but cumulatively they can be used to meet the needs. This cumulative power for processing big data applications can be achieved by using distributed systems with MapReduce model under Apache Hadoop framework. Mere implementation of the application on distributed systems may not make optimal use of available resources. Hence, optimizing scheduling algorithms may further improvise the use of resources. This paper discusses various scheduling algorithms implemented in Hadoop environment. The paper also discusses how fine-tuning of scheduling policies could be used to achieve better performance of different applications, which have been implemented and tested in Apache Hadoop.