Personal Name Resolution of Web People Search

Disambiguating personal names in a set of documents (such as a set of web pages returned in response to a person name) is a dicult and challenging task. In this paper, we explore the extent to which the \cluster hypothesis" for this task holds (i.e., that similar documents tend to represent the same person). We explore two clustering techniques which used either (1) term based matching (single pass clustering) or (2) semantic based matching (Probabilistic Latent Semantic Analysis). We compare and contrast these strategies and provide strong evidence to suggest that the hypothesis holds for the former. And in fact, on the new evaluation platform of the SemEval 2007 Web People Search task, we show that using single pass clustering with a standard IR document representations ts well with the assumptions about the data and the task which yields state-of-the-art performance.

[1]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[2]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[3]  Thomas Kalt,et al.  A New Probabilistic Model of Text Classification and Retrieval , 1998 .

[4]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[5]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[6]  David M. Pennock,et al.  Methods and metrics for cold-start recommendations , 2002, SIGIR '02.

[7]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[8]  M. Taffet Looking Ahead to Person Resolution , 2004 .

[9]  James Allan,et al.  Cross-Document Coreference on a Large Scale Corpus , 2004, NAACL.

[10]  Eduard Hovy,et al.  Multi-Document Person Name Resolution , 2004 .

[11]  David W. Embley,et al.  Grouping search-engine returned citations for person-name queries , 2004, WIDM '04.

[12]  Julio Gonzalo,et al.  A testbed for people searching strategies in the WWW , 2005, SIGIR '05.

[13]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[14]  Xiaojun Wan,et al.  Person resolution in person search results: WebHawk , 2005, CIKM '05.

[15]  Ted Pedersen,et al.  Name Discrimination by Clustering Similar Contexts , 2005, CICLing.

[16]  Bradley Malin,et al.  Unsupervised Name Disambiguation via Social Network Similarity , 2005 .

[17]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[18]  Danushka Bollegala,et al.  Extracting Key Phrases to Disambiguate Personal Name Queries in Web Search , 2006 .

[19]  Susumu Horiguchi,et al.  Personal Name Resolution Crossover Documents by a Semantics-Based Approach , 2006, IEICE Trans. Inf. Syst..

[20]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).