论文信息 - Large-Scale Supervised Models for Noun Phrase Bracketing

Large-Scale Supervised Models for Noun Phrase Bracketing

Interpreting the structure of noun phrases (NPs) is important for many Natural Language Processing (NLP) tasks. This work extends the state-of-the-art in NP bracketing by: creating supervised models trained on a large annotated corpus; applying these to longer, more complex NPs; and using the resulting system to improve the output of the Bikel (2004) parser. Using a large corpus of manually annotated Penn Treebank NPs we have developed a supervised model that brackets simple NPs with 93.01% F-score. We extend the evaluation to include longer, more complex NPs that are rarely dealt with in the literature, attaining 91.44% F-score. Finally, we implement a post-processing module that brackets NPs identified by the Bikel (2004) parser, which outperforms the parser itself by 8.13% F-score.

David Vadas

[1] Mitchell P. Marcus,et al. A theory of syntactic recognition for natural language , 1979 .

[2] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3] Usama M. Fayyad,et al. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[4] James Pustejovsky,et al. Lexical Semantic Techniques for Corpus Analysis , 1993, CL.

[5] P. Resnik. Selection and information: a class-based approach to lexical relationships , 1993 .

[6] Mitchell P. Marcus,et al. Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[7] Ken Barker,et al. A Trainable Bracketer for Noun Modifiers , 1998, Canadian Conference on AI.

[8] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[10] Mitchell P. Marcus,et al. On the parameter space of generative lexicalized statistical parsing models , 2004 .

[11] Frank Keller,et al. The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks , 2004, NAACL.

[12] Daniel Marcu,et al. NP Bracketing by Maximum Entropy Tagging and SVM Reranking , 2004, EMNLP.

[13] Preslav Nakov,et al. A study of using search engine page hits as a proxy for n-gram frequencies , 2005 .

[14] Dan I. Moldovan,et al. On the semantics of noun compounds , 2005, Comput. Speech Lang..

[15] Preslav Nakov,et al. Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing , 2005, CoNLL.

[16] James R. Curran,et al. Adding Noun Phrase Structure to the Penn Treebank , 2007, ACL.

[17] Hal Daumé. Notes on CG and LM-BFGS Optimization of Logistic Regression , 2008 .