Unsupervised grammar inference systems for natural language

In recent years there have been significant advances in the field of Unsupervised Grammar Inference (UGI) for Natural Languages such as English or Dutch. This paper presents a broad range of UGI implementations, where we can begin to see how the theory has been put to practise. Several mature systems are emerging, built using complex models and capable of deriving natural language grammatical phenomena. The range of systems is classified into: models based on Categorical Grammar (GraSp, CLL, EMILE); Memory Based Learning Models (FAMBL, RISE); Evolutionary computing models (ILM, LAgts); and string-pattern searches (ABL, GB). An objectively measurable statistical comparison of performances of the systems reviewed is not yet feasible. However, their merits and shortfalls are discussed, as well as a look at what the future has in store for UGI.

[1]  Eric Atwell,et al.  A corpus for interstellar communication , 2001 .

[2]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[3]  Pedro M. Domingos,et al.  The RISE 2.0 System: A Case Study in Multistrategy Learning , 1995 .

[4]  Andrew Roberts,et al.  Automatic Acquisition of Word Classification Using Distribution Analysis of Content Words with Respect to Function Words , 2002 .

[5]  Menno van Zaanen,et al.  Comparing Two Unsupervised Grammar Induction Systems: Alignment-Based Learning vs. EMILE , 2001 .

[6]  Mikhail Belkin,et al.  Using eigenvectors of the bigram graph to infer morpheme identity , 2002, SIGMORPHON.

[7]  Steven Finch,et al.  Finding structure in language , 1995 .

[8]  DeviceTed Briscoeejb Grammatical Acquisition : Inductive Bias andCoevolution of Language and the LanguageAcquisition , 2000 .

[9]  Eric Atwell,et al.  Constituent-Likelihood Grammar , 1983 .

[10]  Mark A. Paskin,et al.  Grammatical Bigrams , 2001, NIPS.

[11]  M. R. Vervoort,et al.  Games, walks and grammars: Problems I've worked on , 2000 .

[12]  Eric Atwell,et al.  A comparative evaluation of modern English corpus grammatical annotation schemes , 2000 .

[13]  Keith L. Clark,et al.  Using Grammatical Inference to Automate Information Extraction from the Web , 2001, PKDD.

[14]  Eric Atwell,et al.  Visualisation of long distance grammatical collocation patterns in language , 2001, Proceedings Fifth International Conference on Information Visualisation.

[15]  E. Bates,et al.  2 ON THE INSEPARABILITY OF GRAMMAR AND THE LEXICON : EVIDENCE FROM ACQUISITION , APHASIA AND REAL-TIME PROCESSING , 1997 .

[16]  Eric Atwell,et al.  The Automated Evaluation of Inferred Word Classifications , 1994, ECAI.

[17]  Pieter W. Adriaans,et al.  Learning Shallow Context-free Languages under Simple Distributions , 2001 .

[18]  Eric Atwell,et al.  Pattern Recognition Applied to the Acquisition of a Grammatical Classification System From Unrestricted English Text , 1987, EACL.

[19]  Suresh Manandhar,et al.  Translating Treebank Annotation for Evaluation , 2001, ACL 2001.

[20]  Menno van Zaanen,et al.  Bootstrapping structure into language : alignment-based learning , 2001, ArXiv.

[21]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[22]  D. Searls,et al.  Gene structure prediction by linguistic methods. , 1994, Genomics.

[23]  S. Kirby,et al.  The emergence of linguistic structure: an overview of the iterated learning model , 2002 .

[24]  Antal van den Bosch Careful abstraction from instance families in memory-based language learning , 1999, J. Exp. Theor. Artif. Intell..

[25]  Simon Kirby,et al.  Natural Language From Artificial Life , 2002, Artificial Life.

[26]  Dayne Freitag,et al.  Using grammatical inference to improve precision in information extraction , 1997, ICML 1997.

[27]  Peter Juel Henrichsen,et al.  GraSp: Grammar Learning from Unlabelled Speech Corpora , 2002, CoNLL.

[28]  Suresh Manandhar,et al.  A psychologically plausible and computationally effective approach to learning syntax , 2001, CoNLL.

[29]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[30]  Koby Crammer,et al.  Advances in Neural Information Processing Systems 14 , 2002 .

[31]  Ted Briscoe Grammatical acquisition: Inductive bias and coevolution of language and the language acquisition device , 2000 .