A survey of grammatical inference methods for natural language learning

The high complexity of natural language and the huge amount of human and temporal resources necessary for producing the grammars lead several researchers in the area of Natural Language Processing to investigate various solutions for automating grammar generation and updating processes. Many algorithms for Context-Free Grammar inference have been developed in the literature. This paper provides a survey of the methodologies for inferring context-free grammars from examples, developed by researchers in the last decade. After introducing some preliminary definitions and notations concerning learning and inductive inference, some of the most relevant existing grammatical inference methods for Natural Language are described and classified according to the kind of presentation (if text or informant) and the type of information (if supervised, unsupervised, or semi-supervised). Moreover, the state of the art of the strategies for evaluation and comparison of different grammar inference methods is presented. The goal of the paper is to provide a reader with introduction to major concepts and current approaches in Natural Language Learning research.

[1]  Dana Angluin,et al.  Inference of Reversible Languages , 1982, JACM.

[2]  Andrew Roberts,et al.  Unsupervised grammar inference systems for natural language , 2002 .

[3]  Colin de la Higuera,et al.  Identification with Probability One of Stochastic Deterministic Linear Languages , 2003, ALT.

[4]  Eytan Ruppin,et al.  Unsupervised learning of natural languages , 2006 .

[5]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[6]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[7]  Pat Langley,et al.  Learning Context-Free Grammars with a Simplicity Bias , 2000, ECML.

[8]  Suresh Manandhar,et al.  A psychologically plausible and computationally effective approach to learning syntax , 2001, CoNLL.

[9]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  Katsuhiko Nakamura,et al.  Incremental Learning of Context Free Grammars , 2002, ICGI.

[12]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[13]  T. Yokomori On Polynomial-Time Learnability in the Limit of Strictly Deterministic Automata , 1995, Machine Learning.

[14]  Ted Briscoe Grammatical acquisition: Inductive bias and coevolution of language and the language acquisition device , 2000 .

[15]  Menno van Zaanen,et al.  Bootstrapping structure into language : alignment-based learning , 2001, ArXiv.

[16]  José-Miguel Benedí,et al.  RNA Modeling by Combining Stochastic Context-Free Grammars and n-Gram Models , 2002, Int. J. Pattern Recognit. Artif. Intell..

[17]  Pieter W. Adriaans,et al.  Learning Shallow Context-free Languages under Simple Distributions , 2001 .

[18]  Hervé Déjean ALLiS: a Symbolic Learning System for Natural Language Learning , 2000, CoNLL/LLL.

[19]  Katsuhiko Nakamura,et al.  Synthesizing Context Free Grammars from Sample Strings Based on Inductive CYK Algorithm , 2000, ICGI.

[20]  Menno van Zaanen,et al.  Alignment-based learning versus emile: A comparison , 2001 .

[21]  DeviceTed Briscoeejb Grammatical Acquisition : Inductive Bias andCoevolution of Language and the LanguageAcquisition , 2000 .

[22]  Takeshi Koshiba,et al.  Inferring pure context-free languages from positive data , 2000, Acta Cybern..

[23]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[24]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[25]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[26]  Katsuhiko Nakamura,et al.  Incremental Learning of Context Free Grammars by Extended Inductive CYK Algorithm , 2003, ECML Workshop on Learning Contex-Free Grammars.

[27]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[28]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[29]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[30]  Yasubumi Sakakibara,et al.  Recent Advances of Grammatical Inference , 1997, Theor. Comput. Sci..

[31]  Yasubumi Sakakibara,et al.  Learning Context-Free Grammars from Partially Structured Examples , 2000, ICGI.

[32]  Rens Bod,et al.  A DOP Model for Semantic Interpretation , 1997, ACL.

[33]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[34]  Christian Hänig,et al.  UnsuParse: unsupervised Parsing with unsupervised Part of Speech Tagging , 2008, LREC.

[35]  Pieter W. Adriaans,et al.  The EMILE 4.1 Grammar Induction Toolbox , 2002, ICGI.

[36]  Enrique Vidal,et al.  Inference of k-Testable Languages in the Strict Sense and Application to Syntactic Pattern Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[38]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[39]  Rens Bod,et al.  A DOP model for semantic interpretation , 1997 .

[40]  K. G. Subramanian,et al.  Learning code regular and code linear languages , 1996, ICGI.

[41]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[42]  J. Baker Trainable grammars for speech recognition , 1979 .

[43]  Eytan Ruppin,et al.  Learning Syntactic Constructions from Raw Corpora , 2004 .

[44]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[45]  Georgios Paliouras,et al.  e-GRIDS: Computationally Efficient Gramatical Inference from Positive Examples , 2004, Grammars.

[46]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[47]  Bart Cramer,et al.  Limitations of Current Grammar Induction Algorithms , 2007, ACL.

[48]  Frann Cois Denis,et al.  PAC Learning from Positive Statistical Queries , 1998, ALT.