Online AutoML: An adaptive AutoML framework for online learning

Automated Machine Learning (AutoML) has been used successfully in settings where the learning task is assumed to be static. In many realworld scenarios, however, the data distribution will evolve over time, and it is yet to be shown whether AutoML techniques can effectively design online pipelines in dynamic environments. This study aims to automate pipeline design for online learning while continuously adapting to data drift. For this purpose, we design an adaptive Online Automated Machine Learning (OAML) system, searching the complete pipeline configuration space of online learners, including preprocessing algorithms and ensembling techniques. This system combines the inherent adaptation capabilities of online learners with the fast automated pipeline (re)optimization capabilities of AutoML. Focusing on optimization techniques that can adapt to evolving objectives, we evaluate asynchronous genetic programming and asynchronous successive halving to optimize these pipelines continually. We experiment on real and artificial data streams with varying types of concept drift to test the performance and adaptation capabilities of the proposed system. The results confirm the utility of OAML over popular online learning algorithms and underscore the benefits of continuous pipeline redesign in the presence of data drift.

[1]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[2]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[3]  Talel Abdessalem,et al.  River: machine learning for streaming data in Python , 2020, J. Mach. Learn. Res..

[4]  Geoffrey I. Webb,et al.  Characterizing concept drift , 2015, Data Mining and Knowledge Discovery.

[5]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..

[6]  Bernd Bischl,et al.  An Open Source AutoML Benchmark , 2019, ArXiv.

[7]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[8]  João Gama,et al.  Machine learning for streaming data: state of the art, challenges, and opportunities , 2019, SKDD.

[9]  João Gama,et al.  Self Hyper-Parameter Tuning for Data Streams , 2018, DS.

[10]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[11]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[12]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[13]  Randal S. Olson,et al.  Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.

[14]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[15]  J. Vanschoren,et al.  GAMA: a General Automated Machine learning Assistant , 2020, ECML/PKDD.

[16]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[17]  Damien Fay,et al.  Generic adaptation strategies for automated machine learning , 2018, ArXiv.

[18]  Thomas Seidl,et al.  MOA: A Real-Time Analytics Open Source Framework , 2011, ECML/PKDD.

[19]  Michèle Sebag,et al.  Towards AutoML in the presence of Drift: first results , 2018, IJCAI 2018.

[20]  Matthias Carnein,et al.  Towards Automated Configuration of Stream Clustering Algorithms , 2019, PKDD/ECML Workshops.

[21]  Ali Kashif Bashir,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2013, ICIRA 2013.

[22]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[23]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[24]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[25]  Damien Fay,et al.  Automated adaptation strategies for stream learning , 2018, Machine Learning.

[26]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[27]  Ryszard S. Michalski,et al.  Incremental learning with partial instance memory , 2002, Artif. Intell..

[28]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[29]  Joaquin Vanschoren,et al.  Adaptation Strategies for Automated Machine Learning on Evolving Data , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Geoff Holmes,et al.  Algorithm Selection on Data Streams , 2014, Discovery Science.

[31]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.