AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution.

[1]  Bogdan Gabrys,et al.  Towards Automatic Composition of Multicomponent Predictive Systems , 2016, HAIS.

[2]  MengChu Zhou,et al.  Data-Driven Service Composition in Enterprise SOA Solutions: A Petri Net Approach , 2010, IEEE Transactions on Automation Science and Engineering.

[3]  Bogdan Gabrys,et al.  Automatic Composition and Optimization of Multicomponent Predictive Systems With an Extended Auto-WEKA , 2016, IEEE Transactions on Automation Science and Engineering.

[4]  Athanasios Tsakonas,et al.  GRADIENT: Grammar-driven genetic programming framework for building multi-component, hierarchical predictive systems , 2012, Expert Syst. Appl..

[5]  Marco F. Huber,et al.  Survey on Automated Machine Learning , 2019, ArXiv.

[6]  Eyke Hüllermeier,et al.  ML-Plan: Automated machine learning via hierarchical planning , 2018, Machine Learning.

[7]  Bogdan Gabrys,et al.  Architecture for development of adaptive on-line prediction models , 2009, Memetic Comput..

[8]  Alexander Allen,et al.  Benchmarking Automatic Machine Learning Frameworks , 2018, ArXiv.

[9]  Yolanda Gil,et al.  P4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning , 2018 .

[10]  Gisele L. Pappa,et al.  RECIPE: A Grammar-Based Framework for Automatically Evolving Classification Pipelines , 2017, EuroGP.

[11]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[12]  Randal S. Olson,et al.  TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning , 2016, AutoML@ICML.

[13]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[14]  Alberto Trombetta,et al.  Modeling and Validating BPMN Diagrams , 2009, 2009 IEEE Conference on Commerce and Enterprise Computing.

[15]  Jano I. van Hemert,et al.  Scientific Workflow: A Survey and Research Directions , 2007, PPAM.

[16]  Bogdan Gabrys,et al.  Modelling Multi-Component Predictive Systems as Petri Nets , 2017 .

[17]  Marco F. Huber,et al.  Benchmark and Survey of Automated Machine Learning Frameworks. , 2019 .