Towards AutoML in the presence of Drift: first results

Research progress in AutoML has lead to state of the art solutions that can cope quite wellwith supervised learning task, e.g., classification with AutoSklearn. However, so far thesesystems do not take into account the changing nature of evolving data over time (i.e., theystill assume i.i.d. data); even when this sort of domains are increasingly available in realapplications (e.g., spam filtering, user preferences, etc.). We describe a first attempt to de-velop an AutoML solution for scenarios in which data distribution changes relatively slowlyover time and in which the problem is approached in a lifelong learning setting. We extendAuto-Sklearn with sound and intuitive mechanisms that allow it to cope with this sort ofproblems. The extended Auto-Sklearn is combined with concept drift detection techniquesthat allow it to automatically determine when the initial models have to be adapted. Wereport experimental results in benchmark data from AutoML competitions that adhere tothis scenario. Results demonstrate the effectiveness of the proposed methodology.

[1]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[2]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[3]  Daniel L. Silver,et al.  The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness , 1996, Connect. Sci..

[4]  JefI’rty C. Schlirrlrrer Beyond incremental processing : Tracking concept drift , 1999 .

[5]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[6]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[7]  D. Silver,et al.  Selective Functional Transfer : Inductive Bias from Related Tasks , 2001 .

[8]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[9]  Robert E. Mercer,et al.  The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data , 2002, Canadian Conference on AI.

[10]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[11]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[12]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[13]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[14]  Frank Hutter,et al.  Automated configuration of algorithms for solving hard computational problems , 2009 .

[15]  Indre liobaite,et al.  Change with Delayed Labeling: When is it Detectable? , 2010, ICDM 2010.

[16]  Indre Zliobaite,et al.  Change with Delayed Labeling: When is it Detectable? , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[17]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[20]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[21]  Qiang Yang,et al.  Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[22]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[23]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[24]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[25]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[26]  Geoff Holmes,et al.  Algorithm Selection on Data Streams , 2014, Discovery Science.

[27]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  MetaStream: A meta-learning based method for periodic algorithm selection in time-changing data , 2014, Neurocomputing.

[28]  Marie Persson,et al.  Improved concept drift handling in surgery prediction and other applications , 2015, Knowledge and Information Systems.

[29]  Geoff Holmes,et al.  Having a Blast: Meta-Learning and Heterogeneous Ensembles for Data Streams , 2015, 2015 IEEE International Conference on Data Mining.

[30]  Herna L. Viktor,et al.  Intelligent Adaptive Ensembles for Data Stream Mining: A High Return on Investment Approach , 2015, NFMCP.

[31]  Bing Liu,et al.  Lifelong Learning for Sentiment Classification , 2015, ACL.

[32]  Sergio Escalera,et al.  Design of the 2015 ChaLearn AutoML challenge , 2015, IJCNN.

[33]  Herna L. Viktor,et al.  Fast Hoeffding Drift Detection Method for Evolving Data Streams , 2016, ECML/PKDD.

[34]  Shuai Wang,et al.  Learning Cumulatively to Become More Knowledgeable , 2016, KDD.

[35]  Kate Smith-Miles,et al.  Instance spaces for machine learning classification , 2017, Machine Learning.

[36]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[37]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[38]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[39]  Herna Viktor,et al.  Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams , 2017, Machine Learning.

[40]  Sergio Escalera,et al.  Analysis of the AutoML Challenge Series 2015-2018 , 2019, Automated Machine Learning.