AutoWeka4MCPS-AVATAR: Accelerating Automated Machine Learning Pipeline Composition and Optimisation

Automated machine learning pipeline (ML) composition and optimisation aim at automating the process of finding the most promising ML pipelines within allocated resources (i.e., time, CPU and memory). Existing methods, such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods frequently require a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid in the first place, and attempting to execute them is a waste of time and resources. To address this issue, we propose a novel method to evaluate the validity of ML pipelines, without their execution, using a surrogate model (AVATAR). The AVATAR generates a knowledge base by automatically learning the capabilities and effects of ML algorithms on datasets' characteristics. This knowledge base is used for a simplified mapping from an original ML pipeline to a surrogate model which is a Petri net based pipeline. Instead of executing the original ML pipeline to evaluate its validity, the AVATAR evaluates its surrogate model constructed by capabilities and effects of the ML pipeline components and input/output simplified mappings. Evaluating this surrogate model is less resource-intensive than the execution of the original pipeline. As a result, the AVATAR enables the pipeline composition and optimisation methods to evaluate more pipelines by quickly rejecting invalid pipelines. We integrate the AVATAR into the sequential model-based algorithm configuration (SMAC). Our experiments show that when SMAC employs AVATAR, it finds better solutions than on its own.

[1]  Athanasios Tsakonas,et al.  GRADIENT: Grammar-driven genetic programming framework for building multi-component, hierarchical predictive systems , 2012, Expert Syst. Appl..

[2]  Katarzyna Musial,et al.  AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model , 2020, IDA.

[3]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[4]  Marco F. Huber,et al.  Benchmark and Survey of Automated Machine Learning Frameworks , 2019, J. Artif. Intell. Res..

[5]  Gisele L. Pappa,et al.  RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines , 2017, EuroGP.

[6]  Bogdan Gabrys,et al.  Modelling Multi-Component Predictive Systems as Petri Nets , 2017 .

[7]  MengChu Zhou,et al.  Data-Driven Service Composition in Enterprise SOA Solutions: A Petri Net Approach , 2010, IEEE Transactions on Automation Science and Engineering.

[8]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[9]  Bogdan Gabrys,et al.  Architecture for development of adaptive on-line prediction models , 2009, Memetic Comput..

[10]  Yolanda Gil,et al.  P4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning , 2018 .

[11]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[12]  Marco F. Huber,et al.  Survey on Automated Machine Learning , 2019, ArXiv.

[13]  Bogdan Gabrys,et al.  Automatic Composition and Optimization of Multicomponent Predictive Systems With an Extended Auto-WEKA , 2016, IEEE Transactions on Automation Science and Engineering.

[14]  Alexander Allen,et al.  Benchmarking Automatic Machine Learning Frameworks , 2018, ArXiv.

[15]  Bogdan Gabrys,et al.  Adapting Multicomponent Predictive Systems using Hybrid Adaptation Strategies with Auto-WEKA in Process Industry , 2016, AutoML@ICML.

[16]  Bogdan Gabrys,et al.  Towards Automatic Composition of Multicomponent Predictive Systems , 2016, HAIS.

[17]  Randal S. Olson,et al.  TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning , 2016, AutoML@ICML.

[18]  Eyke Hüllermeier,et al.  ML-Plan: Automated machine learning via hierarchical planning , 2018, Machine Learning.