Linguistic Theory Based Contextual Evidence Mining for Statistical Chinese Co-Reference Resolution

Under statistical learning framework, the paper focuses on how to use traditional linguistic findings on anaphora resolution as a guide for mining and organizing contextual features for Chinese co-reference resolution. The main achievements are as follows. (1) In order to simulate “syntactic and semantic parallelism factor”, we extract “bags of word form and POS” feature and “bag of semes” feature from the contexts of the entity mentions and incorporate them into the baseline feature set. (2) Because it is too coarse to use the feature of bags of word form, POS tag and seme to determine the syntactic and semantic parallelism between two entity mentions, we propose a method for contextual feature reconstruction based on semantic similarity computation, in order that the reconstructed contextual features could better approximate the anaphora resolution factor of “Syntactic and Semantic Parallelism Preferences”. (3) We use an entity-mention-based contextual feature representation instead of isolated word-based contextual feature representation, and expand the size of the contextual windows in addition, in order to approximately simulate “the selectional restriction factor” for anaphora resolution. The experiments show that the multi-level contextual features are useful for co-reference resolution, and the statistical system incorporated with these features performs well on the standard ACE datasets.

[1]  Qun Liu,et al.  基於《知網》的辭彙語義相似度計算 (Word Similarity Computing Based on How-net) [In Chinese] , 2002, ROCLING/IJCLCLP.

[2]  Michael Strube,et al.  The Influence of Minimum Edit Distance on Reference Resolution , 2002, EMNLP.

[3]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[4]  Qiang Dong,et al.  Hownet and the Computation of Meaning: (With CD-ROM) , 2006 .

[5]  Mei Zheng,et al.  Robust Pronominal Resolution within Chinese Text , 2005 .

[6]  Wang Hou Research on Chinese Pronominal Anaphora Resolution , 2001 .

[7]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[8]  Xiaoqiang Luo,et al.  A Statistical Model for Multilingual Entity Detection and Tracking , 2004, NAACL.

[9]  R. Iida,et al.  Incorporating Contextual Cues in Trainable Models for Coreference Resolution , 2003 .

[10]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[11]  Vincent Ng,et al.  Machine Learning for Coreference Resolution: From Local Classification to Global Ranking , 2005, ACL.

[12]  Tom Hampton,et al.  SRA: Description of the IE2 System Used for MUC-7 , 1998, MUC.

[13]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[14]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[15]  Jian Su,et al.  Improving Noun Phrase Coreference Resolution by Matching Strings , 2004, IJCNLP.

[16]  Kees van Deemter,et al.  On Coreferring: Coreference in MUC and Related Annotation Schemes , 2000, CL.

[17]  Qiang Dong,et al.  Hownet And The Computation Of Meaning , 2006 .