Semi-supervised Constituent Grammar Induction Based on Text Chunking Information

There is a growing interest in unsupervised grammar induction, which does not require syntactic annotations, but provides less accurate results than the supervised approach. Aiming at improving the accuracy of the unsupervised approach, we have resorted to additional information, which can be obtained more easily. Shallow parsing or chunking identifies the sentence constituents (noun phrases, verb phrases, etc.), but without specifying their internal structure. There exist highly accurate systems to perform this task, and thus this information is available even for languages for which large syntactically annotated corpora are lacking. In this work we have investigated how the results of a pattern-based unsupervised grammar induction system improve as data on new kind of phrases are added, leading to a significant improvement in performance. We have analyzed the results for three different languages. We have also shown that the system is able to significantly improve the results of the unsupervised system using the chunks provided by automatic chunkers.

[1]  Hong Shen,et al.  Voting Between Multiple Data Representations for Text Chunking , 2005, Canadian AI.

[2]  Dan Klein,et al.  Natural language grammar induction with a generative constituent-context model , 2005, Pattern Recognit..

[3]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[4]  Dan Klein,et al.  Prototype-Driven Grammar Induction , 2006, ACL.

[5]  Rens Bod,et al.  Unsupervised Parsing with U-DOP , 2006, CoNLL.

[6]  Mirella Lapata,et al.  Proceedings of the Fourteenth Conference on Computational Natural Language Learning , 2010, CoNLL 2010.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Ralph Grishman,et al.  A Treebank of Spanish and its Application to Parsing , 2000, LREC.

[9]  Slav Petrov,et al.  Products of Random Latent Variable Grammars , 2010, NAACL.

[10]  Lourdes Araujo,et al.  Identifying Patterns for Unsupervised Grammar Induction , 2010, CoNLL.

[11]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[12]  Lourdes Araujo,et al.  Highly accurate error-driven method for noun phrase detection , 2008, Pattern Recognit. Lett..

[13]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[14]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[15]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[16]  Erik F. Tjong Kim Sang,et al.  Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..

[17]  Haizhou Li,et al.  K-Best Combination of Syntactic Parsers , 2009, EMNLP.