Exploiting Web querying for Web people search

Searching for people on the Web is one of the most common query types submitted to Web search engines today. However, when a person name is queried, the returned Webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding the Webpages related to the specific person of interest is left to the user. Many Web People Search (WePS) approaches have been developed recently that attempt to automate this disambiguation process. Nevertheless, the disambiguation quality of these techniques leaves major room for improvement. In this article, we present a new WePS approach. It is based on issuing additional auxiliary queries to the Web to gain additional knowledge about the Webpages that need to be disambiguated. Thus, the approach uses the Web as an external data source by issuing queries to collect co-occurrence statistics. These statistics are used to assess the overlap of the contextual entities extracted from the Webpages. The article also proposes a methodology to make this Web querying technique efficient. Further, the article proposes an approach that is capable of combining various types of disambiguating information, including other common types of similarities, by applying a correlation clustering approach with after-clustering of singleton clusters. These properties allow the framework to get an advantage in terms of result quality over other state-of-the-art WePS techniques.

[1]  Cheng Niu,et al.  Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction , 2004, ACL.

[2]  I. C. Lerman,et al.  Les bases de la classification automatique , 1971 .

[3]  Julio Gonzalo,et al.  A testbed for people searching strategies in the WWW , 2005, SIGIR '05.

[4]  Yi Zhang,et al.  Web based linkage , 2007, WIDM '07.

[5]  Hiroshi Nakagawa,et al.  Person Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics , 2008, PAKDD.

[6]  Dmitri V. Kalashnikov,et al.  Attribute and object selection queries on objects with probabilistic attributes , 2012, ACM Trans. Database Syst..

[7]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[8]  Dmitri V. Kalashnikov,et al.  Adaptive graphical approach to entity resolution , 2007, JCDL '07.

[9]  Dmitri V. Kalashnikov,et al.  Domain-independent data cleaning via analysis of entity-relationship graph , 2006, TODS.

[10]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[11]  Deeparnab Chakrabarty,et al.  Knapsack Problems , 2008 .

[12]  Sriram Raghavan,et al.  Stanford WebBase components and applications , 2006, TOIT.

[13]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[14]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[15]  Min-Yen Kan,et al.  PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[16]  V KalashnikovDmitri,et al.  Exploiting Web querying for Web people search , 2012 .

[17]  Julio Gonzalo,et al.  WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[18]  Chu-Ren Huang,et al.  PolyUHK: A Robust Information Extraction System for Web PersonalNames , 2009 .

[19]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[20]  Dmitri V. Kalashnikov,et al.  Web People Search via Connection Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[21]  M. de Rijke,et al.  Resolving Person Names in Web People Search , 2009, Weaving Services and People on the World Wide Web.

[22]  Dmitri V. Kalashnikov,et al.  Self-tuning in Graph-Based Reference Disambiguation , 2007, DASFAA.

[23]  Ying Chen,et al.  CU-COMSEM: Exploring Rich Features for Unsupervised Web Personal Name Disambiguation , 2007, SemEval@ACL.

[24]  Ziqi Zhang,et al.  WIT: Web People Search Disambiguation using Random Walks , 2007, SemEval@ACL.

[25]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[26]  Hiroshi Nakagawa,et al.  Person name disambiguation by bootstrapping , 2010, SIGIR.

[27]  Dmitri V. Kalashnikov,et al.  Exploiting Web querying for Web People Search in WePS2 , 2009 .

[28]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[29]  Jianyong Wang,et al.  GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[30]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[31]  Xiaojun Wan,et al.  Person resolution in person search results: WebHawk , 2005, CIKM '05.

[32]  Dmitri V. Kalashnikov,et al.  Exploiting context analysis for combining multiple entity resolution systems , 2009, SIGMOD Conference.

[33]  Dmitri V. Kalashnikov,et al.  Disambiguation Algorithm for People Search on the Web , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[34]  Dmitri V. Kalashnikov,et al.  Exploiting relationships for object consolidation , 2005, IQIS '05.

[35]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[36]  Dan Roth,et al.  Semantic Integration in Text: From Ambiguous Names to Identifiable Entities , 2005, AI Mag..

[37]  David Yarowsky,et al.  JHU1 : An Unsupervised Approach to Person Name Disambiguation using Web Snippets , 2007, SemEval@ACL.

[38]  Christopher Joseph Pal,et al.  Improving Author Coreference by Resource-Bounded Information Gathering from the Web , 2007, IJCAI.

[39]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[40]  Atsuhiro Takasu,et al.  Improving the performance of personal name disambiguation using web directories , 2008, Inf. Process. Manag..

[41]  Dmitri V. Kalashnikov,et al.  Towards breaking the quality curse.: a web-querying approach to web people search. , 2008, SIGIR '08.

[42]  Douglas W. Oard,et al.  Determine the Entity Number in Hierarchical Clustering for Web Personal Name Disambiguation , 2009 .

[43]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[44]  Jian Xu,et al.  High Performance Clustering for Web Person Name Disambiguation Using Topic Capturing , 2011 .

[45]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[46]  Dmitri V. Kalashnikov,et al.  Exploiting Relationships for Domain-Independent Data Cleaning , 2005, SDM.

[47]  Xianpei Han,et al.  Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation , 2010, ACL.