A Comparison of AutoML Tools for Machine Learning, Deep Learning and XGBoost

This paper presents a benchmark of supervised Automated Machine Learning (AutoML) tools. Firstly, we analyze the characteristics of eight recent open-source AutoML tools (Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon, H2O AutoML, rminer, TPOT and TransmogrifAI) and describe twelve popular OpenML datasets that were used in the benchmark (divided into regression, binary and multi-class classification tasks). Then, we perform a comparison study with hundreds of computational experiments based on three scenarios: General Machine Learning (GML), Deep Learning (DL) and XGBoost (XGB). To select the best tool, we used a lexicographic approach, considering first the average prediction score for each task and then the computational effort. The best predictive results were achieved for GML, which were further compared with the best OpenML public results. Overall, the best GML AutoML tools obtained competitive results, outperforming the best OpenML models in five datasets. These results confirm the potential of the general-purpose AutoML tools to fully automate the Machine Learning (ML) algorithm selection and tuning.

[1]  Sergio Escalera,et al.  Analysis of the AutoML Challenge Series 2015-2018 , 2019, Automated Machine Learning.

[2]  Juliana Freire,et al.  AutoML using Metadata Language Embeddings , 2019, ArXiv.

[3]  Vassilis Christophides,et al.  Putting the Human Back in the AutoML Loop , 2020, EDBT/ICDT Workshops.

[4]  Carlos Martins,et al.  An Automated and Distributed Machine Learning Framework for Telecommunications Risk Management , 2020, ICAART.

[5]  Fela Winkelmolen,et al.  Amazon SageMaker Autopilot: a white box AutoML solution at scale , 2020, DEEM@SIGMOD.

[6]  Habib Asseiss Neto,et al.  NASirt: AutoML based learning with instance-level complexity information , 2020, ArXiv.

[7]  Jaume Bacardit Applications of evolutionary computation: 19th European conference, Evoapplications 2016 Porto, Portugal, March 30 – April 1, 2016 proceedings, part II , 2016 .

[8]  Neil Dhir,et al.  An Automatic Type-Inferential General Latent Feature Model , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[9]  Alex Alves Freitas,et al.  A critical review of multi-objective optimization in data mining: a position paper , 2004, SKDD.

[10]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[11]  Qingquan Song,et al.  Techniques for Automated Machine Learning , 2019, SIGKDD Explor..

[12]  Renato Umeton,et al.  Automated machine learning: Review of the state-of-the-art and opportunities for healthcare , 2020, Artif. Intell. Medicine.

[13]  Qingquan Song,et al.  Auto-Keras: An Efficient Neural Architecture Search System , 2018, KDD.

[14]  Elliot Meyerson,et al.  Evolutionary neural AutoML for deep learning , 2019, GECCO.

[15]  Marius Lindauer,et al.  Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL , 2020, ArXiv.

[16]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[17]  Maximilien Kintz,et al.  Can AutoML outperform humans? An evaluation on popular OpenML datasets using AutoML Benchmark , 2020, 2020 2nd International Conference on Artificial Intelligence, Robotics and Control.

[18]  Paulo Cortez,et al.  Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool , 2010, ICDM.

[19]  Hui Xu,et al.  NASABN: A Neural Architecture Search Framework for Attention-Based Networks , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[20]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[21]  Marco F. Huber,et al.  Benchmark and Survey of Automated Machine Learning Frameworks , 2019, J. Artif. Intell. Res..

[22]  Reza Farivar,et al.  Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[23]  Marius Lindauer,et al.  Auto-Sklearn 2.0: The Next Generation , 2020, ArXiv.

[24]  Bernd Bischl,et al.  An Open Source AutoML Benchmark , 2019, ArXiv.

[25]  Hang Zhang,et al.  AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data , 2020, ArXiv.

[26]  Randal S. Olson,et al.  Automating Biomedical Data Science Through Tree-Based Pipeline Optimization , 2016, EvoApplications.

[27]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[28]  Chi Wang,et al.  FLO: Fast and Lightweight Hyperparameter Optimization for AutoML , 2019, ArXiv.