A Perspective View and Survey of Meta-Learning

Different researchers hold different views of what the term meta-learning exactlymeans. The first part of this paper provides our own perspective view in which the goal isto build self-adaptive learners (i.e. learning algorithms that improve their bias dynamicallythrough experience by accumulating meta-knowledge). The second part provides a survey ofmeta-learning as reported by the machine-learning literature. We find that, despite differentviews and research lines, a question remains constant: how can we exploit knowledge aboutlearning (i.e. meta-knowledge) to improve the performance of learning algorithms? Clearlythe answer to this question is key to the advancement of the field and continues being thesubject of intensive research.

[1]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[2]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[3]  John H. Holland,et al.  COGNITIVE SYSTEMS BASED ON ADAPTIVE ALGORITHMS1 , 1978 .

[4]  Thomas G. Dietterich,et al.  Learning and Inductive Inference , 1982 .

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  Paul E. Utgoff,et al.  Shift of bias for inductive concept learning , 1984 .

[7]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[8]  David Tcheng,et al.  MORE ROBUST CONCEPT LEARNING USING DYNAMICALLY – VARIABLE BIAS , 1987 .

[9]  Larry A. Rendell,et al.  Layered Concept-Learning and Dynamically Variable Bias Management , 1987, IJCAI.

[10]  Oren Etzioni,et al.  Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[11]  Donald Perlis,et al.  Explicitly biased generalization , 1989, Comput. Intell..

[12]  D.E. Goldberg,et al.  Classifier Systems and Genetic Algorithms , 1989, Artif. Intell..

[13]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[14]  Diana Faye Gordon Active bias adjustment for incremental, supervised concept learning , 1990 .

[15]  J. Baltes,et al.  Case{based Meta Learning: Sustained Learning supported by a Dynamically Biased Version Space , 1992 .

[16]  David W. Aha,et al.  Generalizing from Case studies: A Case Study , 1992, ML.

[17]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[18]  Diana F. Gordon Queries for Bias Testing , 1992 .

[19]  Carla E. Brodley,et al.  Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection , 1993 .

[20]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[21]  Salvatore J. Stolfo,et al.  Experiments on multistrategy learning by meta-learning , 1993, CIKM '93.

[22]  Steven Minton An Analytic Learning System for Specializing Heuristics , 1993, IJCAI.

[23]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[24]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[25]  Corso Elvezia Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1995 .

[26]  Sebastian Thrun,et al.  Learning One More Thing , 1994, IJCAI.

[27]  Christopher J. Merz,et al.  Dynamical Selection of Learning Algorithms , 1995, AISTATS.

[28]  Jürgen Schmidhuber Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1995, ICML.

[29]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[30]  João Gama,et al.  Characterization of Classification Algorithms , 1995, EPIA.

[31]  W. Spears,et al.  For Every Generalization Action, Is There Really an Equal and Opposite Reaction? , 1995, ICML.

[32]  Ashok K. Goel,et al.  Meta-Cases: Explaining Case-Based Reasoning , 1996, EWCBR.

[33]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[34]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[35]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[36]  Salvatore J. Stolfo,et al.  An extensible meta-learning approach for scalable and accurate inductive learning , 1996 .

[37]  Victor R. Lesser,et al.  The Use of Meta-level Information in Learning Situation-Specific Coordination , 1997, IJCAI.

[38]  Pedro M. Domingos Knowledge Acquisition from Examples Via Multiple Models , 1997 .

[39]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[40]  Pedro M. Domingos Knowledge Acquisition form Examples Vis Multiple Models , 1997, ICML.

[41]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[42]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[43]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[44]  E. B. Baum,et al.  Manifesto for an evolutionary economics of intelligence , 1998 .

[45]  P. Brazdil Data Transformation and Model Selection by Experimentation and Meta-learning 1 Model Selection by Experimentation or Using Meta-knowledge? 1.1 Model Selection by Experimentation , 1998 .

[46]  Mark B. Ring Child: A First Step Towards Continual Learning , 1998, Learning to Learn.

[47]  Christophe Giraud-Carrier Beyond predictive accuracy : what? , 1998 .

[48]  Jonathan Baxter,et al.  Theoretical Models of Learning to Learn , 1998, Learning to Learn.

[49]  Pedro M. Domingos Knowledge Discovery Via Multiple Models , 1998, Intell. Data Anal..

[50]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[51]  Lorien Y. Pratt,et al.  A Survey of Connectionist Network Reuse Through Transfer , 1998, Learning to Learn.

[52]  Larry A. Rendell,et al.  On the development of inductive learning algorithms: generating flexible and adaptable concept representations , 1998 .

[53]  Sebastian Thrun,et al.  Clustering Learning Tasks and the Selective Cross-Task Transfer of Knowledge , 1998, Learning to Learn.

[54]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[55]  Salvatore J. Stolfo,et al.  Minimal Cost Complexity Pruning of Meta-Classifiers , 1999, AAAI/IAAI.

[56]  Marco Colombetti,et al.  What Is a Learning Classifier System? , 1999, Learning Classifier Systems.

[57]  Wei Fan,et al.  Using Conflicts Among Multiple Base Classifiers to Measure the Performance of Stacking , 1999 .

[58]  Stewart W. Wilson,et al.  Learning Classifier Systems, From Foundations to Applications , 2000 .

[59]  Carlos Soares,et al.  Ranking Classification Algorithms Based on Relevant Performance Information , 2000 .

[60]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[61]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[62]  Hilan Bensusan,et al.  A Higher-order Approach to Meta-learning , 2000, ILP Work-in-progress reports.

[63]  Saso Dzeroski,et al.  Combining Multiple Models with Meta Decision Trees , 2000, PKDD.

[64]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[65]  Salvatore J. Stolfo,et al.  On the Accuracy of Meta-learning for Scalable Data Mining , 2004, Journal of Intelligent Information Systems.

[66]  Marie desJardins,et al.  Evaluation and selection of biases in machine learning , 1995, Machine Learning.

[67]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[68]  C. Brodley Recursive Automatic Bias Selection for Classifier Construction , 2004, Machine Learning.

[69]  Error Reduction through Learning Multiple Descriptions , 1996, Machine Learning.

[70]  Carla E. Brodley,et al.  Recursive automatic bias selection for classifier construction , 1995, Machine Learning.

[71]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[72]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[73]  Philip K. Chan,et al.  Meta-learning in distributed data mining systems: Issues and approaches , 2007 .

[74]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .