AutoML Pipeline Selection: Efficiently Navigating the Combinatorial Space

Data scientists seeking a good supervised learning model on a dataset have many choices to make: they must preprocess the data, select features, possibly reduce the dimension, select an estimation algorithm, and choose hyperparameters for each of these pipeline components. With new pipeline components comes a combinatorial explosion in the number of choices! In this work, we design a new AutoML system TensorOboe to address this challenge: an automated system to design a supervised learning pipeline. TensorOboe uses low rank tensor decomposition as a surrogate model for efficient pipeline search. We also develop a new greedy experiment design protocol to gather information about a new dataset efficiently. Experiments on large corpora of real-world classification problems demonstrate the effectiveness of our approach.

[1]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[2]  Melih Elibol,et al.  Probabilistic Matrix Factorization for Automated Machine Learning , 2017, NeurIPS.

[3]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[4]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[5]  James Caverlee,et al.  Tensor Completion Algorithms in Big Data Analytics , 2017, ACM Trans. Knowl. Discov. Data.

[6]  Juliana Freire,et al.  AutoML using Metadata Language Embeddings , 2019, ArXiv.

[7]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[8]  Dae Won Kim,et al.  OBOE: Collaborative Filtering for AutoML Model Selection , 2018, KDD.

[9]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[10]  F. Pukelsheim Optimal Design of Experiments , 1993 .

[11]  A. Azzouz 2011 , 2020, City.

[12]  Mohit Singh,et al.  Combinatorial Algorithms for Optimal Design , 2019, COLT.

[13]  R. C. St. John,et al.  D-Optimality for Regression Designs: A Review , 1975 .

[14]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[15]  Frank Hutter,et al.  Using Meta-Learning to Initialize Bayesian Optimization of Hyperparameters , 2014, MetaSel@ECAI.

[16]  Juliana Freire,et al.  AlphaD3M: Machine Learning Pipeline Synthesis , 2021, ArXiv.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Parikshit Ram,et al.  An ADMM Based Framework for AutoML Pipeline Configuration , 2020, AAAI.

[19]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[20]  Carsten Binnig,et al.  Democratizing Data Science through Interactive Curation of ML Pipelines , 2019, SIGMOD Conference.

[21]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[22]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[23]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[24]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[25]  G. Box Science and Statistics , 1976 .

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Gene H. Golub,et al.  Matrix computations , 1983 .

[28]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[29]  William W. Hager,et al.  Updating the Inverse of a Matrix , 1989, SIAM Rev..

[30]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[31]  Ming Gu,et al.  Efficient Algorithms for Computing a Strong Rank-Revealing QR Factorization , 1996, SIAM J. Sci. Comput..

[32]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[33]  J. Sherman,et al.  Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .

[34]  Joaquin Vanschoren,et al.  OpenML-Python: an extensible Python API for OpenML , 2019, ArXiv.

[35]  Lars Schmidt-Thieme,et al.  Learning hyperparameter optimization initializations , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[38]  Ivan Oseledets,et al.  Tensor-Train Decomposition , 2011, SIAM J. Sci. Comput..

[39]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[40]  A. Wald On the Efficient Design of Statistical Investigations , 1943 .

[41]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[42]  Randal S. Olson,et al.  TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning , 2016, AutoML@ICML.

[43]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[44]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[45]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[46]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[47]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[48]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..