Learning trees from strings: a strong learning algorithm for some context-free grammars

Standard models of language learning are concerned with weak learning: the learner, receiving as input only information about the strings in the language, must learn to generalise and to generate the correct, potentially infinite, set of strings generated by some target grammar. Here we define the corresponding notion of strong learning: the learner, again only receiving strings as input, must learn a grammar that generates the correct set of structures or parse trees. We formalise this using a modification of Gold's identification in the limit model, requiring convergence to a grammar that is isomorphic to the target grammar. We take as our starting point a simple learning algorithm for substitutable context-free languages, based on principles of distributional learning, and modify it so that it will converge to a canonical grammar for each language. We prove a corresponding strong learning result for a subclass of context-free grammars.

[1]  Daniel Götzmann Multiple Context-Free Grammars , 2007 .

[2]  Alexander Clark,et al.  Polynomial Identification in the Limit of Substitutable Context-free Languages , 2005 .

[3]  Etsuji Tomita,et al.  A Fast Algorithm for Checking the Inclusion for Very Simple Deterministic Pushdown Automata , 1993 .

[4]  Dana Angluin,et al.  Inference of Reversible Languages , 1982, JACM.

[5]  Ryo Yoshinaka,et al.  Efficient learning of multiple context-free languages with multidimensional substitutability from positive data , 2011, Theor. Comput. Sci..

[6]  Ryo Yoshinaka,et al.  Integration of the Dual Approaches in the Distributional Learning of Context-Free Grammars , 2012, LATA.

[7]  John Case,et al.  Machine Inductive Inference and Language Identification , 1982, ICALP.

[8]  Frank Drewes,et al.  Learning a Regular Tree Language from a Teacher , 2003, Developments in Language Theory.

[9]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[10]  Derek G. Corneil,et al.  The graph isomorphism disease , 1977, J. Graph Theory.

[11]  Sham M. Kakade,et al.  Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[12]  Ryo Yoshinaka,et al.  Identification in the Limit of k, l-Substitutable Context-Free Languages , 2008, ICGI.

[13]  Shmuel Friedland,et al.  On the graph isomorphism problem , 2008, ArXiv.

[14]  Herbert B. Enderton,et al.  A mathematical introduction to logic , 1972 .

[15]  Kenneth Wexler,et al.  Formal Principles of Language Acquisition , 1980 .

[16]  Alexander Clark,et al.  The syntactic concept lattice: Another algebraic theory of the context-free languages? , 2015, J. Log. Comput..

[17]  T Petrie,et al.  Probabilistic functions of finite-state markov chains. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Manuel Blum,et al.  Toward a Mathematical Theory of Inductive Inference , 1975, Inf. Control..

[19]  Ryo Yoshinaka,et al.  Distributional Learning of Abstract Categorial Grammars , 2011, LACL.

[20]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[21]  Philip H. Miller,et al.  Strong generative capacity - the semantics of linguistic formalism , 2000, CSLI lecture notes series.

[22]  T. Petrie Probabilistic functions of finite-state markov chains. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Daniel N. Osherson,et al.  Systems That Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists , 1990 .

[24]  Noam Chomsky,et al.  Poverty of the Stimulus Revisited , 2011, Cogn. Sci..

[25]  Yasubumi Sakakibara,et al.  Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..

[26]  Phil Blunsom,et al.  Inducing Tree-Substitution Grammars , 2010, J. Mach. Learn. Res..

[27]  Makoto Kanazawa,et al.  MIX Is Not a Tree-Adjoining Language , 2012, ACL.

[28]  Yasubumi Sakakibara,et al.  Learning context-free grammars from structural data in polynomial time , 1988, COLT '88.

[29]  Damián López,et al.  Inference of reversible tree languages , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Alexander Clark,et al.  A Language Theoretic Approach to Syntactic Structure , 2011, MOL.

[31]  Seymour Ginsburg,et al.  The mathematical theory of context free languages , 1966 .