A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance

The Apache Hadoop framework is an open source implementation of MapReduce for processing and storing big data. However, to get the best performance from this is a big challenge because of its large number configuration parameters. In this paper, the concept of critical issues of Hadoop system, big data and machine learning have been highlighted and an analysis of some machine learning techniques applied so far, for improving the Hadoop performance is presented. Then, a promising machine learning technique using deep learning algorithm is proposed for Hadoop system performance improvement.

[1]  Dick H. J. Epema,et al.  Towards Machine Learning-Based Auto-tuning of MapReduce , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[2]  Mohd Soperi Mohd Zahid,et al.  Auto-tuned Hadoop MapReduce for ECG analysis , 2015, 2015 IEEE Student Conference on Research and Development (SCOReD).

[3]  Andre Menolli,et al.  Hadoop MapReduce Configuration Parameters and System Performance : a Systematic Review , 2014 .

[4]  Preeti Preeti,et al.  A review on Machine Learning Techniques , 2017 .

[5]  OoiBeng Chin,et al.  The performance of MapReduce , 2010, VLDB 2010.

[6]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[7]  Bo Zhang,et al.  Self-Configuration of the Number of Concurrently Running MapReduce Jobs in a Hadoop Cluster , 2015, 2015 IEEE International Conference on Autonomic Computing.

[8]  Palden Lama,et al.  AROMA: automated resource allocation and configuration of mapreduce environment in the cloud , 2012, ICAC '12.

[9]  Ayoub Ait Lahcen,et al.  Big Data technologies: A survey , 2017, J. King Saud Univ. Comput. Inf. Sci..

[10]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[11]  Xuehai Zhou,et al.  An Adaptive Auto-configuration Tool for Hadoop , 2014, 2014 19th International Conference on Engineering of Complex Computer Systems.

[12]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13]  Mark Dredze,et al.  Machine learning:Trends, perspectives, and prospects , 2015 .

[14]  Lieven Eeckhout,et al.  RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration , 2016, IEEE Transactions on Parallel and Distributed Systems.

[15]  Yusuf Kavurucu,et al.  Hadoop Ecosystem and Its Analysis on Tweets , 2015 .

[16]  Dominique Heger Hadoop Performance Tuning - A Pragmatic & Iterative Approach , 2013 .

[17]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[18]  José A. B. Fortes,et al.  Hadoop Performance Self-Tuning Using a Fuzzy-Prediction Approach , 2016, 2016 IEEE International Conference on Autonomic Computing (ICAC).

[19]  Jia Min-Zheng,et al.  Research on the Performance Optimization of Hadoop in Big Data Environment , 2015 .

[20]  Lei Zhang,et al.  Review of hadoop performance optimization , 2016, 2016 2nd IEEE International Conference on Computer and Communications (ICCC).

[21]  Frank Dehne,et al.  Automatic, On-Line Tuning of YARN Container Memory and CPU Parameters , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[22]  Kushal Datta,et al.  Gunther: Search-Based Auto-Tuning of MapReduce , 2013, Euro-Par.

[23]  Durgaprasad Gangodkar,et al.  Hadoop, MapReduce and HDFS: A Developers Perspective☆ , 2015 .

[24]  Chao-Chun Yeh,et al.  Machine Learning-Based Configuration Parameter Tuning on Hadoop System , 2015, 2015 IEEE International Congress on Big Data.

[25]  Dili Wu A profiling and performance analysis based self-tuning system for optimization of Hadoop MapReduce cluster configuration , 2013 .

[26]  Athanasios V. Vasilakos,et al.  Big data analytics: a survey , 2015, Journal of Big Data.

[27]  Yogesh Singh,et al.  A REVIEW OF STUDIES ON MACHINE LEARNING TECHNIQUES , 2007 .