Vacillatory Learning of Nearly Minimal Size Grammars

In Gold's influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold's paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n > 1 distinct, but equivalent, correct grammars. He showed that larger classes of languages can be algorithmically learned (in the limit) by converging to up to n + 1 rather than up to n correct grammars. He also argued that, for ''small'' n>1, it is plausible that people might sometimes converge to vacillating between up to n grammars. The insistence on small n was motivated by the consideration that, for ''large'' n, at least one of n grammars would be too large to fit in people's heads. Of course, even for Gold's n = 1 case, the single grammar converged to in the limit may be infeasibly large. An interesting complexity restriction to make, then, on the final grammar(s) converged to in the limit is that they all have small size. In this paper we study some of the trade-offs in learning power involved in making a well-defined version of this restriction. We show and exploit as a tool the desirable property that the learning power under our size-restricted criteria (for successful learning) is independent of the underlying acceptable programming systems. We characterize the power of our size-restricted criteria and use this characterization to prove that some classes of languages, which can be learned by converging in the limit to up to n + 1 nearly minimal size correct grammars, cannot be learned by converging to up to n unrestricted grammars even if these latter grammars are allowed to have a finite number of anomalies (i.e., mistakes) per grammar. We also show that there is no loss of learning power in demanding that the final grammars be nearly minimal size iff one is willing to tolerate an unbounded, finite number of anomalies in the final grammars and there is a constant bound on the number of different grammars converged to in the limit. Hence, if we allow an unbounded, finite number of anomalies in the final grammars and the number of different grammars converged to in the limit is unbounded but finite (or if there is a constant bound on the number of anomalies allowed in the final grammars), then there is a loss of learning power in requiring that the final grammars be nearly minimal size. These results do not always match what might be expected from the cases, previously examined by Freivalds, Kinber, and Chen, of learning nearly minimal size programs for functions.

[1]  Paul Young,et al.  An introduction to the general theory of algorithms , 1978 .

[2]  Kenneth Wexler,et al.  Formal Principles of Language Acquisition , 1980 .

[3]  John Case,et al.  Complexity Issues for Vacillatory Function Identification , 1995, Inf. Comput..

[4]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[5]  John Case,et al.  Machine Inductive Inference and Language Identification , 1982, ICALP.

[6]  R. V. Freivald Minimal Gödel Numbers and Their Identification in the Limit , 1975, MFCS.

[7]  K. Wexler On extensional learnability , 1982, Cognition.

[8]  Jr. Hartley Rogers Theory of Recursive Functions and Effective Computability , 1969 .

[9]  Efim B. Kinber,et al.  On a Theory of Inductive Inference , 1977, FCT.

[10]  John Case,et al.  Convergence to nearly minimal size grammars by vacillating learning machines , 1989, COLT '89.

[11]  Arun Sharma,et al.  Program Size Restrictions in Computational Learning , 1994, Theor. Comput. Sci..

[12]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[13]  Manuel Blum,et al.  A Machine-Independent Theory of the Complexity of Recursive Functions , 1967, JACM.

[14]  Daniel N. Osherson,et al.  Criteria of Language Learning , 1982, Inf. Control..

[15]  Mark A. Fulk Prudence and Other Conditions on Formal Language Learning , 1990, Inf. Comput..

[16]  D. Osherson,et al.  A note on formal learning theory , 1982, Cognition.

[17]  D. Osherson,et al.  Note on a central lemma for learning theory , 1983 .

[18]  Daniel N. Osherson,et al.  Systems That Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists , 1990 .

[19]  Ya. M. Barzdin,et al.  Towards a Theory of Inductive Inference (in Russian) , 1973, MFCS.

[20]  John Case The power of vacillation , 1988, COLT '88.

[21]  John Case,et al.  Comparison of Identification Criteria for Machine Inductive Inference , 1983, Theor. Comput. Sci..

[22]  D. Osherson,et al.  Learning theory and natural language , 1984, Cognition.

[23]  Manuel Blum,et al.  Toward a Mathematical Theory of Inductive Inference , 1975, Inf. Control..

[24]  Keh-Jiann Chen,et al.  Tradeoffs in machine inductive inference , 1981 .

[25]  Keh-Jiann Chen Tradeoffs in the Inductive Inference of Nearly Minimal Size Programs , 1982, Inf. Control..

[26]  Mark A. Fulk A study of inductive inference machines , 1986 .

[27]  S. Pinker Formal models of language learning , 1979, Cognition.