On the impact of forgetting on learning machines

People tend not to have perfect memories when it comes to learning, or to anything else for that matter. Most formal studies of learning, however, assume a perfect memory. Some approaches have restricted the number of items that could be retained. We introduce a complexity theoretic accounting of memory utilization by learning machines. In our new model, memory is measured in bits as a function of the size of the input. There is a hierarchy of learnability based on increasing memory allotment. The lower bound results are proved using an unusual combination of pumping and mutual recursion theorem arguments. For technical reasons, it was necessary to consider two types of memory : long and short term.

[1]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[2]  Tetsuhiro Miyahara,et al.  A note on iteratively working strategies in inductive inference , 1990 .

[3]  Paul Young,et al.  An introduction to the general theory of algorithms , 1978 .

[4]  John R. Anderson The Architecture of Cognition , 1983 .

[5]  Kenneth Wexler,et al.  Formal Principles of Language Acquisition , 1980 .

[6]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[7]  Rusins Freivalds,et al.  Why Sometimes Probabilistic Algorithms Can Be More Effective , 1996, MFCS.

[8]  Harriet Griffin,et al.  Elementary theory of numbers , 1955 .

[9]  Efim Kinber,et al.  Learning with a Limited Memory , 1993 .

[10]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[11]  Rusins Frievalds Inductive inference of minimal programs , 1990, COLT '90.

[12]  Francis Crick,et al.  The function of dream sleep , 1983, Nature.

[13]  Daniel N. Osherson,et al.  Systems That Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists , 1990 .

[14]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[15]  Carl H. Smith,et al.  On the Role of Procrastination in Machine Learning , 1993, Inf. Comput..

[16]  Carl H. Smith,et al.  On the impact of forgetting on learning machines , 1993, COLT '93.

[17]  Carl H. Smith,et al.  Memory Limited Inductive Inference Machines , 1992, SWAT.

[18]  Jeffrey Scott Vitter,et al.  A theory for memory-based learning , 1992, COLT '92.

[19]  Manfred K. Warmuth,et al.  Learning nested differences of intersection-closed concept classes , 2004, Machine Learning.

[20]  T. Miyahara INDUCTIVE INFERNCE BY ITERATIVELY WORKING AND CONSISTENT STRATEGIES WITH ANOMALIES , 1987 .

[21]  Simon Kasif,et al.  Learning Nested Concept Classes with Limited Storage , 1991, IJCAI.

[22]  John Case,et al.  Convergence to nearly minimal size grammars by vacillating learning machines , 1989, COLT '89.

[23]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[24]  Allen Newell,et al.  A Preliminary Analysis of the Soar Architecture as a Basis for General Intelligence , 1991, Artif. Intell..

[25]  W. Feller,et al.  An Introduction to Probability Theory and Its Applications, Vol. 1 , 1967 .

[26]  Leonard Pitt,et al.  Probabilistic inductive inference , 1989, JACM.

[27]  R. Soare Recursively enumerable sets and degrees , 1987 .

[28]  G. A. Miller The magical number seven plus or minus two: some limits on our capacity for processing information. , 1956, Psychological review.

[29]  Jr. Hartley Rogers Theory of Recursive Functions and Effective Computability , 1969 .

[30]  Stephen Cole Kleene,et al.  On notation for ordinal numbers , 1938, Journal of Symbolic Logic.

[31]  Carl H. Smith,et al.  Probabilistic versus Deterministic Memory Limited Learning , 1995, GOSLER Final Report.

[32]  John Case,et al.  Proceedings of the Third Annual Workshop on Computational Learning Theory : University of Rochester, Rochester, New York, August 6-8, 1990 , 1990 .

[33]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[34]  J. J. Hopfield,et al.  ‘Unlearning’ has a stabilizing effect in collective memories , 1983, Nature.

[35]  D. C. Cooper,et al.  Theory of Recursive Functions and Effective Computability , 1969, The Mathematical Gazette.

[36]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[37]  Jean Sallantin,et al.  Some remarks about space-complexity of learning, and circuit complexity of recognizing , 1988, Annual Conference Computational Learning Theory.

[38]  Emile Servan-Schreiber,et al.  The competitive chunking theory: models of perception, learning, and memory , 1991 .

[39]  Friedhelm Meyer auf der Heide,et al.  Trial and error: a new approach to space-bounded learning , 1994, EuroCOLT.

[40]  Manuel Blum,et al.  Toward a Mathematical Theory of Inductive Inference , 1975, Inf. Control..

[41]  Manfred K. Warmuth,et al.  Learning Nested Differences of Intersection-Closed Concept Classes , 1989, COLT '89.

[42]  David Haussler,et al.  Proceedings of the fifth annual workshop on Computational learning theory , 1992, COLT 1992.

[43]  David Haussler,et al.  Proceedings of the 1988 Workshop on Computational Learning Theory : MIT, August 3-5, 1988 , 1989 .

[44]  Klaus P. Jantke,et al.  Combining Postulates of Naturalness in Inductive Inference , 1981, J. Inf. Process. Cybern..

[45]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[46]  C. Y. Hsiung,et al.  Elementary theory of numbers , 1992 .

[47]  D. Levine Introduction to Neural and Cognitive Modeling , 2018 .

[48]  John Case,et al.  Comparison of Identification Criteria for Machine Inductive Inference , 1983, Theor. Comput. Sci..

[49]  Rolf Wiehagen Limes-Erkennung rekursiver Funktionen durch spezielle Strategien , 1975, J. Inf. Process. Cybern..

[50]  Sally Floyd,et al.  Space-bounded learning and the Vapnik-Chervonenkis dimension , 1989, COLT '89.

[51]  Carl H. Smith,et al.  On the role of procrastination for machine learning , 1992, COLT '92.

[52]  R. Smullyan Theory of formal systems , 1962 .

[53]  John Case,et al.  Vacillatory Learning of Nearly Minimal Size Grammars , 1994, J. Comput. Syst. Sci..

[54]  Carl H. Smith,et al.  Probability and Plurality for Aggregations of Learning Machines , 1987, Inf. Comput..