Model Merging versus Model Splitting Context-Free Grammar Induction

When comparing different grammatical inference algorithms, it becomes evident that generic techniques have been used in different systems. Several finite-state learning algorithms use state-merging as their underlying technique and a collection of grammatical inference algorithms that aim to learn context-free grammars build on the concept of substitutability to identify potential grammar rules. When learning context-free grammars, there are essentially two approaches: model merging, which generalizes with more data, and model splitting, which specializes with more data. Both approaches can be combined sequentially in a generic framework. In this article, we investigate the impact of different approaches within the first phase of the framework on system performance.

[1]  Rens Bod,et al.  Unsupervised Parsing with U-DOP , 2006, CoNLL.

[2]  Franco M. Luque,et al.  Bounding the Maximal Parsing Performance of Non-Terminally Separated Grammars , 2010, ICGI.

[3]  Eytan Ruppin,et al.  Unsupervised learning of natural languages , 2006 .

[4]  Anja Belz PCFG Learning by Nonterminal Partition Search , 2002, ICGI.

[5]  Menno van Zaanen,et al.  Computational Grammatical Inference , 2006 .

[6]  Pieter W. Adriaans,et al.  The EMILE 4.1 Grammar Induction Toolbox , 2002, ICGI.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[9]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[10]  M. van Zaanen,et al.  Computational Language Learning , 2010 .

[11]  Menno van Zaanen,et al.  Bootstrapping structure into language : alignment-based learning , 2001, ArXiv.

[12]  Menno van Zaanen,et al.  Alignment-based learning versus emile: A comparison , 2001 .

[13]  Menno van Zaanen Bootstrapping Syntax and Recursion using Alginment-Based Learning , 2000, ICML.

[14]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[15]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[16]  Bradford Starkie Inferring Attribute Grammars with Structured Data for Natural Language Processing , 2002, ICGI.

[17]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[18]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Andrew Radford,et al.  Transformational Grammar: A First Course , 1988 .

[21]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[22]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[23]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.