ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets

Big data processing frameworks (e.g., Spark, Storm) have been extensively used for massive data processing in the industry. To improve the performance and robustness of these frameworks, developers provide users with highly-configurable parameters. Due to the high-dimensional parameter space and complicated interactions of parameters, manual tuning of parameters is time-consuming and ineffective. Building performance-predicting models for big data frameworks is challenging for several reasons: (1) the significant time required to collect training data and (2) the poor accuracy of the prediction model when training data are limited. To meet this challenge, we proposes an <italic>auto-tuning configuration parameters system</italic> (ATCS), a new auto-tuning approach based on <italic>Generative Adversarial Nets</italic> (GAN). ATCS can build a performance prediction model with less training data and without sacrificing model accuracy. Moreover, an optimized <italic>Genetic Algorithm</italic> (GA) is used in ATCS to explore the parameter space for optimum solutions. To prove the effectiveness of ATCS, we select five frequently-used workloads in Spark, each of which runs on five different sized data sets. The results demonstrate that ATCS improves the performance of five frequently-used Spark workloads compared to the default configurations. We achieved a performance increase of <inline-formula> <tex-math notation="LaTeX">$3.5\times $ </tex-math></inline-formula> on average, with a maximum of <inline-formula> <tex-math notation="LaTeX">$6.9\times $ </tex-math></inline-formula>. To obtain similar model accuracy, experiment results also demonstrate that the quantity of ATCS training data is only 6% of <italic>Deep Neural Network</italic> (DNN) data, 13% of <italic>Support Vector Machine</italic> (SVM) data, 18% of <italic>Decision Tree</italic> (DT) data. Moreover, compared to other machine learning models, the average performance increase of ATCS is <inline-formula> <tex-math notation="LaTeX">$1.7\times $ </tex-math></inline-formula> that of DNN, <inline-formula> <tex-math notation="LaTeX">$1.6\times $ </tex-math></inline-formula> that of SVM, <inline-formula> <tex-math notation="LaTeX">$1.7\times $ </tex-math></inline-formula> that of DT on the five typical Spark programs.

[1]  Chein-Shung Hwang,et al.  Hill Climbing for Diversity Retrieval , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Deze Zeng,et al.  MR-COF: A Genetic MapReduce Configuration Optimization Framework , 2015, ICA3PP.

[4]  Sergios Theodoridis,et al.  A geometric approach to Support Vector Machine (SVM) classification , 2006, IEEE Transactions on Neural Networks.

[5]  Leandro Pardo,et al.  THE JENSEN-SHANNON DIVERGENCE , 1997 .

[6]  Tao Ye,et al.  A recursive random search algorithm for network parameter optimization , 2004, PERV.

[7]  Jussi Kangasharju,et al.  Kvasir: Scalable Provision of Semantically Relevant Web Content on Big Data Framework , 2016, IEEE Transactions on Big Data.

[8]  Ben He,et al.  A Novel Method for Tuning Configuration Parameters of Spark Based on Machine Learning , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[9]  Lipo Wang,et al.  A noisy chaotic neural network for solving combinatorial optimization problems: stochastic chaotic simulated annealing , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[11]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[12]  S. Agatonovic-Kustrin,et al.  Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. , 2000, Journal of pharmaceutical and biomedical analysis.

[13]  David Carrera,et al.  ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments , 2015, KDD.

[14]  Xuehai Qian,et al.  Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing , 2018, ASPLOS.

[15]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[16]  Xin Liu,et al.  AutoConfig: Automatic Configuration Tuning for Distributed Message Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[17]  Madhusudhan Govindaraju,et al.  DELMA: Dynamically ELastic MapReduce Framework for CPU-Intensive Applications , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[18]  Sun Da,et al.  Big Data Stream Computing: Technologies and Instances , 2014 .

[19]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[20]  Srinivas Katkoori,et al.  Ant colony optimization technique for macrocell overlap removal , 2004, 17th International Conference on VLSI Design. Proceedings..

[21]  Li Zhang,et al.  MRONLINE: MapReduce online performance tuning , 2014, HPDC '14.

[22]  Chen Wang,et al.  MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs , 2014, Proc. VLDB Endow..

[23]  Yuqing Zhu,et al.  BestConfig: tapping the performance potential of systems via automatic configuration tuning , 2017, SoCC.

[24]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[25]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[26]  Mladen Kezunovic,et al.  Regression tree for stability margin prediction using synchrophasor measurements , 2013, IEEE Transactions on Power Systems.

[27]  Paolo Toth,et al.  A Lagrangian heuristic algorithm for a real-world train timetabling problem , 2006, Discret. Appl. Math..

[28]  Yuqing Zhu,et al.  ACTS in Need: Automatic Configuration Tuning with Scalability Guarantees , 2017, APSys.

[29]  Li Zhang,et al.  SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark , 2015, Conf. Computing Frontiers.

[30]  Lieven Eeckhout,et al.  RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop's Configuration , 2016, IEEE Transactions on Parallel and Distributed Systems.

[31]  Brandon Harris,et al.  Scalable Softcore Vector Processor for Biosequence Applications , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[32]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[33]  Dick H. J. Epema,et al.  Towards Machine Learning-Based Auto-tuning of MapReduce , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[34]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[35]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[36]  Naixue Xiong,et al.  Nash Equilibrium-Based Semantic Cache in Mobile Sensor Grid Database Systems , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[37]  Jun Wang,et al.  Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[38]  K. Gopinath,et al.  Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach , 2016, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).