Unsupervised Grammar Induction Using a Parent Based Constituent Context Model

Grammar induction is one of attractive research areas of natural language processing. Since both supervised and to some extent semi-supervised grammar induction methods require large treebanks, and for many languages, such treebanks do not currently exist, we focused our attention on unsupervised approaches. Constituent Context Model (CCM) seems to be the state of the art in unsupervised grammar induction. In this paper, we show that the performance of CCM in free word order languages (FWOLs) such as Persian is inferior to that of fixed order languages such as English. We also introduce a novel approach, called parent-based constituent context model (PCCM), and show that by using some history notion of context and constituent information of each span's parent, the performance of CCM, especially in dealing with FWOLs, can be significantly improved.

[1]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[2]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[3]  Christopher D. Manning,et al.  The unsupervised learning of natural language structure , 2005 .

[4]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[5]  Heshaam Faili,et al.  Unsupervised grammar induction using history based approach , 2006, Comput. Speech Lang..

[6]  Karine Megerdoomian,et al.  Persian Computational Morphology: A Unification-Based Approach , 2000 .

[7]  Radford,et al.  转换生成语法教程 = Transformational Grammar , 2000 .

[8]  Manabu Okumura,et al.  Grammar Acquisition and Statistical Parsing by Exploiting Local Contextual Information , 1998 .

[9]  Menno van Zaanen ABL: Alignment-Based Learning , 2000, COLING.

[10]  J. Baker Trainable grammars for speech recognition , 1979 .

[11]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[12]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[13]  Bill Keller,et al.  Evolving stochastic context-free grammars from examples using a minimum description length principle , 1997 .

[14]  Rémi Zajac,et al.  Persian-English Machine Translation: An Overview of the Shiraz Project , 2000 .

[15]  Eric Brill,et al.  Automatically Acquiring Phrase Structure Using Distributional Analysis , 1992, HLT.

[16]  Menno van Zaanen,et al.  Comparing Two Unsupervised Grammar Induction Systems: Alignment-Based Learning vs. EMILE , 2001 .

[17]  Joseph P. Levy,et al.  Connectionist models of memory and language , 1995 .

[18]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[19]  Heshaam Faili,et al.  An Application of Lexicalized Grammars in English-Persian Translation , 2004, ECAI.

[20]  Alexander Clark,et al.  Inducing Syntactic Categories by Context Distribution Clustering , 2000, CoNLL/LLL.

[21]  Alexander Clark,et al.  Unsupervised Language Acquisition: Theory and Practice , 2002, ArXiv.