Computational Machine Learning in Theory and Praxis

In the last few decades a computational approach to machine learning has emerged based on paradigms from recursion theory and the theory of computation. Such ideas include learning in the limit, learning by enumeration, and probably approximately correct (pac) learning. These models usually are not suitable in practical situations. In contrast, statistics based inference methods have enjoyed a long and distinguished career. Currently, Bayesian reasoning in various forms, minimum message length (MML) and minimum description length (MDL), are widely applied approaches. They are the tools to use with particular machine learning praxis such as simulated annealing, genetic algorithms, genetic programming, artificial neural networks, and the like. These statistical inference methods select the hypothesis which minimizes the sum of the length of the description of the hypothesis (also called ‘model’) and the length of the description of the data relative to the hypothesis. It appears to us that the future of computational machine learning will include combinations of the approaches above coupled with guaranties with respect to used time and memory resources. Computational learning theory will move closer to practice and the application of the principles such as MDL require further justification. Here, we survey some of the actors in this dichotomy between theory and praxis, we justify MDL via the Bayesian approach, and give a comparison between pac learning and MDL learning of decision trees.

[1]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[2]  Dana Angluin,et al.  Computational learning theory: survey and selected bibliography , 1992, STOC '92.

[3]  Ming Li,et al.  Learning Simple Concept Under Simple Distributions , 1991, SIAM J. Comput..

[4]  Paul M. B. Vitányi,et al.  A Theory of Learning Simple Concepts Under Simple Distributions , 1989, COLT 1989.

[5]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[6]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[7]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[9]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[10]  L. Levin,et al.  THE COMPLEXITY OF FINITE OBJECTS AND THE DEVELOPMENT OF THE CONCEPTS OF INFORMATION AND RANDOMNESS BY MEANS OF THE THEORY OF ALGORITHMS , 1970 .

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[13]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[14]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[15]  Ming Li,et al.  Inductive Reasoning and Kolmogorov Complexity , 1992, J. Comput. Syst. Sci..

[16]  Tao Jiang,et al.  Lower Bounds on Learning Decision Lists and Trees , 1995, Inf. Comput..

[17]  Jorma Rissanen,et al.  Stochastic Complexity in Learning , 1995, J. Comput. Syst. Sci..

[18]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[19]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..