Prefetching-based metadata management in Advanced Multitenant Hadoop

Metadata management is an essential part in Apache Hadoop. Performing optimization of metadata accesses enhances big data storing, processing and analyzing, especially in multitenant environments. Nevertheless, as environmental complexity increases, metadata management is becoming more challenging and costly because of the heavy performance issues. In this paper, we propose a novel approach to improve the performance of metadata management for Hadoop in the multitenant environment based on the prefetching mechanism. We create metadata access graphs based on historical access values, define access patterns and then perform prefetching potential items for the near-future requests to minimize the latency. We present a formal algorithm to apply the prefetching mechanism into the Hadoop system and perform the actual implementation on a recent Hadoop system. Experimental results show that the proposed approach can enable the high performance for metadata management as well as maintain advanced multitenancy features.

[1]  Xiaohua Jia,et al.  Embedding complete binary trees into parity cubes , 2014, The Journal of Supercomputing.

[2]  Liang Chen,et al.  The Dynamically Efficient Mechanism of HDFS Data Prefetching , 2013, 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing.

[3]  Desta Haileselassie Hagos Software-Defined Networking for Scalable Cloud-based Services to Improve System Performance of Hadoop-based Big Data Applications , 2016, Int. J. Grid High Perform. Comput..

[4]  Paresh Wankhede,et al.  Secure and multi-tenant Hadoop cluster - an experience , 2016, 2016 2nd International Conference on Green High Performance Computing (ICGHPC).

[5]  Ebin Deni Raj,et al.  A scalable cloud computing deployment framework for efficient MapReduce operations using Apache YARN , 2014, International Conference on Information Communication and Embedded Systems (ICICES2014).

[6]  Hao Wu,et al.  Enhancing Throughput of Hadoop Distributed File System for Interaction-Intensive Tasks , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[7]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[8]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Kyounghyun Park,et al.  Web-based collaborative big data analytics on big data as a service platform , 2015, 2015 17th International Conference on Advanced Communication Technology (ICACT).

[10]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[11]  원희선,et al.  Multitenant hadoop with advanced resource management = 향상된 자원관리를 지원하는 멀티테넌트 Hadoop , 2016 .

[12]  Myeong-Seon Gil,et al.  Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS , 2016, The Journal of Supercomputing.

[13]  Jun Wang,et al.  Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[14]  Bing Zhang,et al.  DLS: a cloud-hosted data caching and prefetching service for distributed metadata access , 2015, Int. J. Big Data Intell..

[15]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[16]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[17]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[18]  Xubin He,et al.  Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[19]  M. Rajasekhara Babu,et al.  Exploring Vectorization and Prefetching Techniques on Scientific Kernels and Inferring the Cache Performance Metrics , 2015, Int. J. Grid High Perform. Comput..

[20]  Myeong-Seon Gil,et al.  Advanced resource management with access control for multitenant Hadoop , 2015, Journal of Communications and Networks.

[21]  Yao Sun,et al.  A Distributed Cache Framework for Metadata Service of Distributed File Systems , 2013, 2013 International Conference on Parallel and Distributed Systems.

[22]  Yang-Sae Moon,et al.  A formal framework for prefetching based on the type-level access pattern in object-relational DBMSs , 2005, IEEE Transactions on Knowledge and Data Engineering.

[23]  Hui He,et al.  Optimization strategy of Hadoop small file storage for big data in healthcare , 2015, The Journal of Supercomputing.

[24]  Theodore Y. Ts'o,et al.  Kerberos: an authentication service for computer networks , 1994, IEEE Communications Magazine.

[25]  Bo Dong,et al.  Hadoop high availability through metadata replication , 2009, CloudDB@CIKM.

[26]  Jian Liu,et al.  Correlation Based File Prefetching Approach for Hadoop , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.