论文信息 - The role of named entities in Web People Search

The role of named entities in Web People Search

The ambiguity of person names in the Web has become a new area of interest for NLP researchers. This challenging problem has been formulated as the task of clustering Web search results (returned in response to a person name query) according to the individual they mention. In this paper we compare the coverage, reliability and independence of a number of features that are potential information sources for this clustering task, paying special attention to the role of named entities in the texts to be clustered. Although named entities are used in most approaches, our results show that, independently of the Machine Learning or Clustering algorithm used, named entity recognition and classification per se only make a small contribution to solve the problem.

Julio Gonzalo | Enrique Amigó | Javier Artiles

[1] Ying Chen,et al. CU-COMSEM: Exploring Rich Features for Unsupervised Web Personal Name Disambiguation , 2007, SemEval@ACL.

[2] Dmitri V. Kalashnikov,et al. Disambiguation Algorithm for People Search on the Web , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3] Tru H. Cao,et al. Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach , 2008, ASWC.

[4] Eneko Agirre,et al. Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[5] Breck Baldwin,et al. Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[6] David Yarowsky,et al. Multi-document statistical fact extraction and fusion , 2006 .

[7] Silviu Cucerzan,et al. Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[8] David W. Embley,et al. Grouping search-engine returned citations for person-name queries , 2004, WIDM '04.

[9] Eneko Agirre,et al. Word Sense Disambiguation: Algorithms and Applications , 2007 .

[10] Julio Gonzalo,et al. WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[11] Julio Gonzalo,et al. A testbed for people searching strategies in the WWW , 2005, SIGIR '05.