The role of named entities in Web People Search

The ambiguity of person names in the Web has become a new area of interest for NLP researchers. This challenging problem has been formulated as the task of clustering Web search results (returned in response to a person name query) according to the individual they mention. In this paper we compare the coverage, reliability and independence of a number of features that are potential information sources for this clustering task, paying special attention to the role of named entities in the texts to be clustered. Although named entities are used in most approaches, our results show that, independently of the Machine Learning or Clustering algorithm used, named entity recognition and classification per se only make a small contribution to solve the problem.

[1]  Ying Chen,et al.  CU-COMSEM: Exploring Rich Features for Unsupervised Web Personal Name Disambiguation , 2007, SemEval@ACL.

[2]  Dmitri V. Kalashnikov,et al.  Disambiguation Algorithm for People Search on the Web , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Tru H. Cao,et al.  Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach , 2008, ASWC.

[4]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[5]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[6]  David Yarowsky,et al.  Multi-document statistical fact extraction and fusion , 2006 .

[7]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[8]  David W. Embley,et al.  Grouping search-engine returned citations for person-name queries , 2004, WIDM '04.

[9]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[10]  Julio Gonzalo,et al.  WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[11]  Julio Gonzalo,et al.  A testbed for people searching strategies in the WWW , 2005, SIGIR '05.

[12]  Matthias Blume Automatic Entity Disambiguation : Benefits to NER , Relation Extraction , Link Analysis , and Inference , .

[13]  Bradley Malin Unsupervised Name Disambiguation via Social Network Similarity , 2005 .

[14]  Horacio Saggion Experiments on Semantic-based Clustering for Cross-document Coreference , 2008, IJCNLP.

[15]  Manabu Okumura,et al.  TITPI: Web People Search Task Using Semi-Supervised Clustering Approach , 2007, SemEval@ACL.

[16]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[17]  Bernardo Magnini,et al.  IRST-BP: Web People Search Using Name Entities , 2007, SemEval@ACL.

[18]  James Allan,et al.  Cross-Document Coreference on a Large Scale Corpus , 2004, NAACL.

[19]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[20]  Amanda Spink,et al.  Searching for people on Web search engines , 2004, J. Documentation.

[21]  César de Pablo-Sánchez,et al.  UC3M_13: Disambiguation of Person Names Based on the Composition of Simple Bags of Typed Terms , 2007, SemEval@ACL.

[22]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[23]  Zunaid Kazi,et al.  Is Hillary Rodham Clinton the President? Disambiguating Names across Documents , 1999, COREF@ACL.

[24]  Satoshi Sekine,et al.  Extended Named Entity Ontology with Attribute Information , 2008, LREC.

[25]  Xiaojun Wan,et al.  Person resolution in person search results: WebHawk , 2005, CIKM '05.