Take me to SSD: a hybrid block-selection method on HDFS based on storage type

As the era of Big-data has risen, the importance of big data technologies is also increasing day by day. Especially, Hadoop has become a critical part of the overall Big-data system because of its ability to store, process, and analyze thousands of terabytes of data. A major issue for supporting high performance on Hadoop is managing the growth of data while satisfying high storage I/O request. Hadoop's overall performance is largely influenced by the storage input/output(I/O). However, storage I/O technologies are still very limited. Therefore, now more than ever, studies on improving storage I/O on a distributed file system of Hadoop(HDFS) have been gaining popularity. To this end, latest trend in storage systems is to utilize hybrid storage devices. However, it is not easy to use the information of heterogeneous storage devices in HDFS. This is because, when reading data, HDFS is unable to exploit such heterogeneous storage type information yet. In this paper, we propose a hybrid block-selection method on the HDFS, we consider the storage type such as SSD and HDD when reading data. Using this method, the Hadoop Eco System utilizes the high SSD bandwidth by priority. As a result, we certainly improve the Hadoop Eco System overall performance. In the experiments, we demonstrated that our new method efficiently reduced the execution time of select count(*) query and TPC H benchmark up to 22% and 30% on average.

[1]  Yanpei Chen,et al.  The Truth About MapReduce Performance on SSDs , 2014, LISA.

[2]  Praveen Kumar,et al.  Performance evaluation of HDD and SSD on 10GigE, IPoIB & RDMA-IB with Hadoop Cluster Performance Benchmarking System , 2014, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence).

[3]  Yu Xu,et al.  Integrating hadoop and parallel DBMs , 2010, SIGMOD Conference.

[4]  Qing Yang,et al.  I-CASH: Intelligently Coupled Array of SSD and HDD , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5]  C. Mohan,et al.  Are we experiencing a big data bubble? , 2014, SIGMOD Conference.

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Jenq-Shiou Leu,et al.  Comparison of Map-Reduce and SQL on Large-Scale Data Processing , 2010, International Symposium on Parallel and Distributed Processing with Applications.

[8]  Ali Raza Butt,et al.  hatS: A Heterogeneity-Aware Tiered Storage for Hadoop , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[9]  Sangkyu Park,et al.  Performance Implications of SSDs in Virtualized Hadoop Clusters , 2014, 2014 IEEE International Congress on Big Data.

[10]  Liana L. Fong,et al.  Effectiveness Assessment of Solid-State Drive Used in Big Data Services , 2014, 2014 IEEE International Conference on Web Services.

[11]  Umar Farooq Minhas,et al.  SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures , 2014, Proc. VLDB Endow..

[12]  Jerry Chou,et al.  How Much Solid State Drive Can Improve the Performance of Hadoop Cluster ? Performance evaluation of Hadoop on SSD and HDD , 2014 .

[13]  Sang-Won Lee,et al.  A Case for Flash Memory SSD in Hadoop Applications , 2013 .

[14]  Gokul B. Kandiraju,et al.  Investigating hybrid SSD FTL schemes for Hadoop workloads , 2013, CF '13.

[15]  Yon Dohn Chung,et al.  Tajo: A distributed data warehouse system on large clusters , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[16]  Dan Wu,et al.  Understanding the Impacts of Solid-State Storage on the Hadoop Performance , 2013, 2013 International Conference on Advanced Cloud and Big Data.

[17]  Jaehwan Lee,et al.  Introducing SSDs to the Hadoop MapReduce Framework , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[18]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[19]  Toshimori Honjo,et al.  Hardware acceleration of Hadoop MapReduce , 2013, 2013 IEEE International Conference on Big Data.