GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search

Finding information about people using search engines is one of the most common activities on the Web. However, search engines usually return a long list of Web pages, which may be relevant to many namesakes, especially given the explosive growth of Web data. To address the challenge caused by name ambiguity in Web people search, this paper proposes a novel graph-based framework, GRAPE (abbr. a Graph-based fRamework for disAmbiguating People appEarances in Web search). In GRAPE, people tag information (e.g., people name, organization, and email address) surrounding the queried people name is extracted from the search results, a graph-based unsupervised algorithm is then developed to cluster the extracted tags, where a new method, Cohesion, is introduced to measure the importance of a tag for clustering, and each final cluster of tags represents a unique people entity. Experimental results show that our proposed framework outperforms the state-of-the-art Web people name disambiguation approaches.

[1]  Jianyong Wang,et al.  Two birds with one stone: a graph-based framework for disambiguating and tagging people names in web search , 2009, WWW '09.

[2]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[3]  Wei Hu,et al.  GHOST: an effective graph-based framework for name distinction , 2008, CIKM '08.

[4]  Ziqi Zhang,et al.  WIT: Web People Search Disambiguation using Random Walks , 2007, SemEval@ACL.

[5]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[6]  Cheng Li,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[7]  Cheng Niu,et al.  Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction , 2004, ACL.

[8]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[9]  Ian H. Witten,et al.  Clustering Documents with Active Learning Using Wikipedia , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Julio Gonzalo,et al.  A testbed for people searching strategies in the WWW , 2005, SIGIR '05.

[11]  Dmitri V. Kalashnikov,et al.  Web People Search via Connection Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[12]  Julio Gonzalo,et al.  WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[13]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[14]  Dmitri V. Kalashnikov,et al.  Towards breaking the quality curse.: a web-querying approach to web people search. , 2008, SIGIR '08.

[15]  Zunaid Kazi,et al.  Is Hillary Rodham Clinton the President? Disambiguating Names across Documents , 1999, COREF@ACL.

[16]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[17]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[18]  Eduard Hovy,et al.  Multi-Document Person Name Resolution , 2004 .

[19]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[20]  Danushka Bollegala,et al.  Disambiguating Personal Names on the Web Using Automatically Extracted Key Phrases , 2006, ECAI.

[21]  William W. Cohen,et al.  Contextual search and name disambiguation in email using graphs , 2006, SIGIR.