How Much Solid State Drive Can Improve the Performance of Hadoop Cluster ? Performance evaluation of Hadoop on SSD and HDD

Hadoop and Map reduce today are facing huge amounts of data and are moving towards ubiquitous for big data storage and processing. This has made it an essential feature to evaluate and characterize the Hadoop file system and its deployment through extensive benchmarking. We have other benchmarking tools widely available with us today that are capable of analyzing the performance of the hadoop system but they are made to either run in a single node system or are created for assessing the storage device that is attached and its basic characteristics as top speed and other hardware related details or manufacturer’s details. For this, the tool used is HiBench that is an essential part of Hadoop and is comprehensive benchmark suit that consist of a complete set of Hadoop programs containing micro benchmarks and real world applications for the purpose of benchmarking the performance of Hadoop on the available type of storage device (i.e. HDD and SSD) and machine configuration. This is helpful to optimize the performance and improve the support towards the limitations of Hadoop system. In this paper we will also present that external sorting algorithm in Hadoop (MapReduce) with SSD can outperform the algorithm run with hard disk. In addition, we also demonstrate that the power consumption can be drastically reduced when SSDs are used.