Disambiguation and Filtering Methods in Using Web Knowledge for Coreference Resolution

We investigate two publicly available web knowledge bases, Wikipedia and Yago, in an attempt to leverage semantic information and increase the performance level of a state-of-the-art coreference resolution (CR) engine. We extract semantic compatibility and aliasing information from Wikipedia and Yago, and incorporate it into a CR system. We show that using such knowledge with no disambiguation and filtering does not bring any improvement over the baseline, mirroring the previous findings. We propose, therefore, a number of solutions to reduce the amount of noise coming from web resources: using disambiguation tools for Wikipedia, pruning Yago to eliminate the most generic categories and imposing additional constraints on affected mentions. Our evaluation experiments on the ACE-02 corpus show that the knowledge, extracted from Wikipedia and Yago, improves our system's performance by 2-3 percentage points.

[1]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[2]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[3]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[4]  Eugene Charniak,et al.  Unsupervised Learning of Name Structure From Coreference Data , 2001, NAACL.

[5]  Sanda M. Harabagiu,et al.  RESOLUTION , 1977, Monatsschrift für Kriminologie und Strafrechtsreform.

[6]  John Shawe-Taylor,et al.  Syllables and other String Kernel Extensions , 2002, ICML.

[7]  Andrew McCallum,et al.  Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference , 2003, IIWeb.

[8]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[9]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[10]  Renata Vieira,et al.  Discourse-New Detectors for Definite Description Resolution: A Survey and a Preliminary Proposal , 2004 .

[11]  Nello Cristianini,et al.  Latent Semantic Kernels , 2001, Journal of Intelligent Information Systems.

[12]  Olga Uryupina Evaluating Name-Matching for Coreference Resolution , 2004, LREC.

[13]  Ellen Riloff,et al.  Unsupervised Learning of Contextual Role Knowledge for Coreference Resolution , 2004, NAACL.

[14]  David Yarowsky,et al.  Resolving and Generating Definite Anaphora by Modeling Hypernymy using Unlabeled Corpora , 2006, CoNLL.

[15]  Simone Paolo Ponzetto,et al.  Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution , 2006, NAACL.

[16]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[17]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[18]  Jian Su,et al.  Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns , 2007, ACL.

[19]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[20]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[21]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[22]  Yannick Versley,et al.  Coreference Systems Based on Kernels Methods , 2008, COLING.

[23]  Rada Mihalcea,et al.  Linking Documents to Encyclopedic Knowledge , 2008, IEEE Intelligent Systems.

[24]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[25]  Yannick Versley,et al.  BART: A Modular Toolkit for Coreference Resolution , 2008, ACL.

[26]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[27]  Carlo Strapparava,et al.  Kernel Methods for Minimally Supervised WSD , 2009, CL.

[28]  Dan Klein,et al.  Simple Coreference Resolution with Rich Syntactic and Semantic Features , 2009, EMNLP.

[29]  Massimo Poesio,et al.  State-of-the-art NLP Approaches to Coreference Resolution: Theory and Practical Recipes , 2009, ACL.

[30]  Luciano Serafini,et al.  Supporting Natural Language Processing with Background Knowledge: Coreference Resolution Case , 2010, International Semantic Web Conference.

[31]  Hans Uszkoreit,et al.  Determining the Origin and Structure of Person Names , 2010, LREC.

[32]  Emanuele Pianta,et al.  Extending English ACE 2005 Corpus Annotation with Ground-truth Links to Wikipedia , 2010, PWNLP@COLING.

[33]  Luciano Serafini,et al.  Using Background Knowledge to Support Coreference Resolution , 2010, ECAI.

[34]  Philip M. McCarthy,et al.  Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches , 2011 .

[35]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[36]  Asif Ekbal,et al.  Multi-metric optimization for coreference: The UniTN / IITP / Essex submission to the 2011 CONLL Shared Task , 2011, CoNLL Shared Task.

[37]  Anne Kao,et al.  P-MATCH: Identifying Part Name in Noisy Text Data , 2012 .

[38]  Sivaji Bandyopadhyay,et al.  Emerging Applications of Natural Language Processing: Concepts and New Research , 2012 .