Meta-Learning and the Full Model Selection Problem

When working as a data analyst, one of my daily tasks is to select appropriate tools from a set of existing data analysis techniques in my toolbox, including data preprocessing, outlier detection, feature selection, learning algorithm and evaluation techniques, for a given data project. This indeed was an enjoyable job at the beginning, because to me finding patterns and valuable information from data is always fun. Things become tricky when several projects needed to be done in a relatively short time. Naturally, as a computer science graduate, I started to ask myself, “What can be automated here?”; because, intuitively, part of my work is more or less a loop that can be programmed. Literally, the loop is “choose, run, test and choose again... until some criterion/goals are met”. In other words, I use my experience or knowledge about machine learning and data mining to guide and speed up the process of selecting and applying techniques in order to build a relatively good predictive model for a given dataset for some purpose. So the following questions arise: “Is it possible to design and implement a system that helps a data analyst to choose from a set of data mining tools? Or at least that provides a useful recommendation about tools that potentially save some time for a human analyst.” To answer these questions, I decided to undertake a long-term study on this topic, to think, define, research, and simulate this problem before coding my dream system. This thesis presents research results, including new methods, algorithms, and theoretical and empirical analysis from two directions, both of which try to propose systematic and efficient solutions to the questions above, using different resource requirements, namely, the meta-learning-based algorithm/parameter ranking approach and the meta-heuristic search-based full-model selection approach. Some of the results have been published in research papers; thus, this thesis also serves as a coherent collection of results in a single volume.

[1]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[2]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[3]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[6]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[7]  Jun Zhang,et al.  Pseudocoevolutionary genetic algorithms for power electronic circuits optimization , 2003, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[9]  Melanie Hilario,et al.  Model selection via meta-learning: a comparative study , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[10]  Daniel Hernández-Lobato,et al.  Pruning in Ordered Regression Bagging Ensembles , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[11]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[12]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[13]  Eyke Hüllermeier,et al.  Decision tree and instance-based learning for label ranking , 2009, ICML '09.

[14]  Evangelos,et al.  [Applied Optimization] Multi-criteria Decision Making Methods: A Comparative Study Volume 44 || A Computational Evaluation of the Original and the Revised AHP , 2000 .

[15]  Joaquin Vanschoren,et al.  Selecting Classification Algorithms with Active Testing on Similar Datasets , 2012 .

[16]  Fabio Roli,et al.  Diversity in Classifier Ensembles: Fertile Concept or Dead End? , 2013, MCS.

[17]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[18]  B. Reilly Social Choice in the South Seas: Electoral Innovation and the Borda Count in the Pacific Island Countries , 2002 .

[19]  Carlos Soares,et al.  A Comparison of Ranking Methods for Classification Algorithm Selection , 2000, ECML.

[20]  Joaquin Vanschoren,et al.  Selecting Classification Algorithms with Active Testing , 2012, MLDM.

[21]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[22]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[23]  Andreas Dengel,et al.  Meta-learning for evolutionary parameter optimization of classifiers , 2012, Machine Learning.

[24]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[25]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[26]  Abraham Bernstein,et al.  Intelligent Assistance for the Data Mining Process: an Ontology-Based Approach , 2002 .

[27]  Steve R. Waterhouse,et al.  Classification using hierarchical mixtures of experts , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.

[28]  P. Andersen,et al.  A procedure for ranking efficient units in data envelopment analysis , 1993 .

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  T. Saaty Fundamentals of Decision Making and Priority Theory With the Analytic Hierarchy Process , 2000 .

[31]  Quan Sun,et al.  Full model selection in the space of data mining operators , 2012, GECCO '12.

[32]  Carla E. Brodley,et al.  Recursive automatic bias selection for classifier construction , 1995, Machine Learning.

[33]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[34]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[35]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[36]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[37]  A. Zell,et al.  Efficient parameter selection for support vector machines in classification and regression via model-based global optimization , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[38]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[39]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[40]  Paul D. Feigin,et al.  Asymptotic Theory for Measures of Concordance with Special Reference to Average Kendall Tau , 1982 .

[41]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[42]  Marco Laumanns,et al.  A Tutorial on Evolutionary Multiobjective Optimization , 2004, Metaheuristics for Multiobjective Optimisation.

[43]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[44]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[45]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining meta-learning and search techniques to select parameters for support vector machines , 2012, Neurocomputing.

[46]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[47]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[48]  Jiri Ocenasek,et al.  Parallel Estimation of Distribution Algorithms , 2010 .

[49]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[50]  Alexander K. Seewald,et al.  How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness , 2002, International Conference on Machine Learning.

[51]  Ya Zhang,et al.  Multi-task learning for boosting with application to web search ranking , 2010, KDD.

[52]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[53]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[54]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[55]  J. Friedman Stochastic gradient boosting , 2002 .

[56]  Michael Mayo Identifying Market Price Levels Using Differential Evolution , 2013, EvoApplications.

[57]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[58]  Scott Kirkpatrick,et al.  Optimization by Simmulated Annealing , 1983, Sci..

[59]  Haleh Vafaie,et al.  Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search , 2009 .

[60]  Jinjin Ma,et al.  Parameter Tuning Using Gaussian Processes , 2012 .

[61]  Carlos Soares,et al.  Combining a multi-objective optimization approach with meta-learning for SVM parameter selection , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[62]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[63]  Rui Camacho,et al.  Improving the Robustness and Encoding Complexity of Behavioural Clones , 2001, ECML.

[64]  Eyke Hüllermeier,et al.  Instance-Based Label Ranking using the Mallows Model , 2008, ECCBR Workshops.

[65]  Quan Sun,et al.  Towards a Framework for Designing Full Model Selection and Optimization Systems , 2013, MCS.

[66]  Evangelos Triantaphyllou,et al.  REDUCTION OF PAIRWISE COMPARISONS IN DECISION MAKING VIA A DUALITY APPROACH , 1999 .

[67]  P. Moral,et al.  Branching and interacting particle interpretations of rare event probabilities , 2006 .

[68]  Michael Mayo Evolutionary Data Selection for Enhancing Models of Intraday Forex Time Series , 2012, EvoApplications.

[69]  S. Dreyfus,et al.  Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .

[70]  H. Akaike A new look at the statistical model identification , 1974 .

[71]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[72]  Quan Sun,et al.  Bagging Ensemble Selection for Regression , 2012, Australasian Conference on Artificial Intelligence.

[73]  J. Baltes,et al.  Case{based Meta Learning: Sustained Learning supported by a Dynamically Biased Version Space , 1992 .

[74]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[75]  Larry A. Rendell,et al.  Empirical learning as a function of concept character , 2004, Machine Learning.

[76]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[77]  João Gama,et al.  Cascade Generalization , 2000, Machine Learning.

[78]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[79]  C. D. Meyer,et al.  Who's #1?: The Science of Rating and Ranking , 2012 .

[80]  M. Lawera Predictive inference : an introduction , 1995 .

[81]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[82]  Pat Langley,et al.  Induction of One-Level Decision Trees , 1992, ML.

[83]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[84]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[85]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[86]  Quan Sun,et al.  Sampling-based Prediction of Algorithm Runtime , 2009 .

[87]  A. Charnes,et al.  Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis , 1984 .

[88]  Carla E. Brodley,et al.  Applying classification algorithms in practice , 1997, Stat. Comput..

[89]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[90]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[91]  J. Vanschoren,et al.  Scientific Workflow Management with ADAMS , 2012, ECML/PKDD.

[92]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[93]  Victor J. Rayward-Smith,et al.  Modern Heuristic Search Methods , 1996 .

[94]  Erick Cantú-Paz,et al.  Feature Subset Selection, Class Separability, and Genetic Algorithms , 2004, GECCO.

[95]  Larry A. Rendell,et al.  Layered Concept-Learning and Dynamically Variable Bias Management , 1987, IJCAI.

[96]  Ulisses Braga-Neto,et al.  Bolstered error estimation , 2004, Pattern Recognit..

[97]  Kristin P. Bennett,et al.  A Pattern Search Method for Model Selection of Support Vector Regression , 2002, SDM.

[98]  J. Friedman Regularized Discriminant Analysis , 1989 .

[99]  Carlos Soares,et al.  A Meta-Learning Method to Select the Kernel Width in Support Vector Regression , 2004, Machine Learning.

[100]  David E. Goldberg,et al.  Bayesian Optimization Algorithm: From Single Level to Hierarchy , 2002 .

[101]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[102]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[103]  Quan Sun,et al.  Pairwise meta-rules for better meta-learning-based algorithm ranking , 2013, Machine Learning.

[104]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[105]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[106]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[107]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[108]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[109]  Yang Yu,et al.  Cocktail Ensemble for Regression , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[110]  Alexandros Kalousis,et al.  Algorithm selection via meta-learning , 2002 .

[111]  Alexander J. Smola,et al.  Collaborative Email-Spam Filtering with the Hashing-Trick , 2009 .

[112]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[113]  Saso Dzeroski,et al.  Ranking with Predictive Clustering Trees , 2002, ECML.

[114]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[115]  J. Costa,et al.  A WEIGHTED RANK MEASURE OF CORRELATION , 2005 .

[116]  Rabab Kreidieh Ward,et al.  Genetic algorithms for feature selection and weighting, a review and study , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[117]  M. Fligner,et al.  Multistage Ranking Models , 1988 .

[118]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[119]  Pavel Brazdil,et al.  Predicting relative performance of classifiers from samples , 2005, ICML '05.

[120]  Ricardo Vilalta,et al.  Using Meta-Learning to Support Data Mining , 2004, Int. J. Comput. Sci. Appl..

[121]  Masaru Tomita,et al.  Dynamic modeling of genetic networks using genetic algorithm and S-system , 2003, Bioinform..

[122]  Rich Caruana,et al.  Getting the Most Out of Ensemble Selection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[123]  Donald Perlis,et al.  Explicitly biased generalization , 1989, Comput. Intell..

[124]  Quan Sun,et al.  Bagging Ensemble Selection , 2011, Australasian Conference on Artificial Intelligence.

[125]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[126]  David E. Goldberg,et al.  Linkage Problem, Distribution Estimation, and Bayesian Networks , 2000, Evolutionary Computation.

[127]  C. Gondro,et al.  A simple genetic algorithm for multiple sequence alignment. , 2007, Genetics and molecular research : GMR.

[128]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[129]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[130]  Fred W. Glover,et al.  The general employee scheduling problem. An integration of MS and AI , 1986, Comput. Oper. Res..

[131]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[132]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[133]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[134]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[135]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[136]  Kate Smith-Miles,et al.  A meta-learning approach to automatic kernel selection for support vector machines , 2006, Neurocomputing.

[137]  Paul E. Utgoff,et al.  Shift of bias for inductive concept learning , 1984 .

[138]  Bernhard Pfahringer Semi-random Model Tree Ensembles: An Effective and Scalable Regression Method , 2011, Australasian Conference on Artificial Intelligence.

[139]  Carlos Soares,et al.  Zoomed Ranking: Selection of Classification Algorithms Based on Relevant Performance Information , 2000, PKDD.

[140]  Eva Ocelíková,et al.  Multi-criteria decision making methods , 2005 .

[141]  Fred Glover,et al.  Tabu Search: A Tutorial , 1990 .

[142]  Geoff Holmes,et al.  Experiment databases , 2012, Machine Learning.

[143]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[144]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[145]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[146]  Abraham Bernstein,et al.  Designing KDD-Workflows via HTN-Planning for Intelligent Discovery Assistance , 2012, KDD 2012.

[147]  Marcus Gallagher,et al.  A hybrid approach to parameter tuning in genetic algorithms , 2005, 2005 IEEE Congress on Evolutionary Computation.

[148]  Christophe G. Giraud-Carrier Metalearning - A Tutorial , 2008 .

[149]  Thomas L. Saaty,et al.  Multicriteria Decision Making: The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation , 1990 .

[150]  Stephen R. Marsland,et al.  Machine Learning - An Algorithmic Perspective , 2009, Chapman and Hall / CRC machine learning and pattern recognition series.

[151]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[152]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[153]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[154]  Sven F. Crone,et al.  Genetic Algorithms for Support Vector Machine Model Selection , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[155]  Kay Chen Tan,et al.  CAutoCSD-evolutionary search and optimisation enabled computer automated control system design , 2004, Int. J. Autom. Comput..

[156]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[157]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[158]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[159]  Hilan Bensusan,et al.  A Higher-order Approach to Meta-learning , 2000, ILP Work-in-progress reports.

[160]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[161]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Evolutionary tuning of SVM parameter values in multiclass problems , 2008, Neurocomputing.

[162]  Hugo Jair Escalante,et al.  Particle Swarm Model Selection , 2009, J. Mach. Learn. Res..

[163]  Grigorios Tsoumakas,et al.  An ensemble uncertainty aware measure for directed hill climbing ensemble pruning , 2010, Machine Learning.

[164]  Charles Elkan,et al.  Estimating the Accuracy of Learned Concepts , 1993, IJCAI.

[165]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[166]  Christophe G. Giraud-Carrier,et al.  The data mining advisor: meta-learning at the service of practitioners , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[167]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.