A Graph-based Approach to Person Name Disambiguation in Web

This article presents a name disambiguation approach to resolve ambiguities between person names and group web pages according to the individuals they refer to. The proposed approach exploits two important sources of entity-centric semantic information extracted from web pages, including personal attributes and social relationships. It takes as input the web pages that are results for a person name search. The web pages are analyzed to extract personal attributes and social relationships. The personal attributes and social relationships are mapped into an undirected weighted graph, called attribute-relationship graph. A graph-based clustering algorithm is proposed to group the nodes representing the web pages, each of which refers to a person entity. The outcome is a set of clusters such that the web pages within each cluster refer to the same person. We show the effectiveness of our approach by evaluating it on large-scale datasets WePS-1, WePS-2, and WePS-3. Experimental results are encouraging and show that the proposed method clearly outperforms several baseline methods and also its counterparts.

[1]  Jie Tang,et al.  A Combination Approach to Web User Profiling , 2010, TKDD.

[2]  Gerhard Weikum,et al.  Cross-Document Co-Reference Resolution using Sample-Based Clustering with Knowledge Enrichment , 2015, TACL.

[3]  Marcos André Gonçalves,et al.  A brief survey of automatic methods for author name disambiguation , 2012, SGMD.

[4]  Nathanael Chambers,et al.  Template-Based Information Extraction without the Templates , 2011, ACL.

[5]  Katja Hofmann,et al.  The University of Amsterdam at WePS2 , 2009 .

[6]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[8]  José Carlos González,et al.  DAEDALUS at WebPS-3 2010: k-Medoids Clustering Using a Cost Function Minimization , 2010, CLEF.

[9]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[10]  Stefano Bragaglia,et al.  Named Entity Disambiguation using Deep Learning on Graphs , 2018, ArXiv.

[11]  Li Yujian,et al.  A Normalized Levenshtein Distance Metric , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[13]  MazhariSara,et al.  A user-profile-based friendship recommendation solution in social networks , 2015 .

[14]  Jian Su,et al.  Which Who are They? People Attribute Extraction and Disambiguationin Web Search Results , 2009 .

[15]  Xianpei Han,et al.  Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation , 2010, ACL.

[16]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[17]  T. IstvánNagy Person Attribute Extraction from the Textual Parts of Web Pages , 2010, CLEF.

[18]  Fei Song,et al.  Web People Search Based on Locality and Relative Similarity Measures , 2009 .

[19]  Madian Khabsa,et al.  Online Person Name Disambiguation with Constraints , 2015, JCDL.

[20]  Dmitri V. Kalashnikov,et al.  Exploiting context analysis for combining multiple entity resolution systems , 2009, SIGMOD Conference.

[21]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[22]  Ying Chen,et al.  CU-COMSEM: Exploring Rich Features for Unsupervised Web Personal Name Disambiguation , 2007, SemEval@ACL.

[23]  Yang Li,et al.  Mining evidences for named entity disambiguation , 2013, KDD.

[24]  Julio Gonzalo,et al.  WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks , 2010, CLEF.

[25]  Seyed Mostafa Fakhrahmad,et al.  A user-profile-based friendship recommendation solution in social networks , 2015, J. Inf. Sci..

[26]  Erhard Rahm,et al.  Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..

[27]  Barbara Carminati,et al.  User similarities on social networks , 2013, Social Network Analysis and Mining.

[28]  Lei Shi,et al.  Web Person Name Disambiguation by Relevance Weighting of Extended Feature Sets , 2010, CLEF.

[29]  T István Nagy,et al.  Person attribute extraction from the textual parts of web pages , 2012 .

[30]  Chengjun Liu,et al.  Discriminant analysis and similarity measure , 2014, Pattern Recognit..

[31]  Julio Gonzalo,et al.  WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[32]  Chu-Ren Huang,et al.  PolyUHK: A Robust Information Extraction System for Web PersonalNames , 2009 .

[33]  Soto Montalvo,et al.  A Data Driven Approach for Person Name Disambiguation in Web Search Results , 2014, COLING.

[34]  Bradley Malin,et al.  Unsupervised Name Disambiguation via Social Network Similarity , 2005 .

[35]  Douglas W. Oard,et al.  Determine the Entity Number in Hierarchical Clustering for Web Personal Name Disambiguation , 2009 .

[36]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[37]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[38]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[39]  M. V. Valkenburg Network Analysis , 1964 .

[40]  Karl Aberer,et al.  Quality-aware similarity assessment for entity matching in Web data , 2012, Inf. Syst..

[41]  Dmitri V. Kalashnikov,et al.  Web People Search via Connection Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[42]  Maarten de Rijke,et al.  Result Disambiguation in Web People Search , 2012, ECIR.

[43]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[44]  Michal Barla Towards Social-based User Modeling and Personalization , 2010 .

[45]  Dmitri V. Kalashnikov,et al.  Exploiting Web querying for Web people search , 2012, ACM Trans. Database Syst..

[46]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[47]  Sohail Asghar,et al.  A survey of author name disambiguation techniques: 2010–2016 , 2017, The Knowledge Engineering Review.

[48]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[49]  Chu-Ren Huang,et al.  A robust web personal name information extraction system , 2012, Expert Syst. Appl..

[50]  Konstantin Avrachenkov,et al.  Using Web Graph Structure for Person Name Disambiguation , 2010, CLEF.

[51]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[52]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[53]  Chris Arney Network Analysis: Methodological Foundations , 2012 .

[54]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[55]  Michael Strube,et al.  Evaluation Metrics For End-to-End Coreference Resolution Systems , 2010, SIGDIAL Conference.

[56]  David Guy Brizan,et al.  A. Survey of Entity Resolution and Record Linkage Methodologies , 2015, Communications of the IIMA.

[57]  Soto Montalvo,et al.  Person name disambiguation on the web in a multilingual context , 2018, Inf. Sci..

[58]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[59]  Soto Montalvo,et al.  Person Name Disambiguation in the Web Using Adaptive Threshold Clustering , 2017, J. Assoc. Inf. Sci. Technol..

[60]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[61]  Jian Xu,et al.  High Performance Clustering for Web Person Name Disambiguation Using Topic Capturing , 2011 .

[62]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[63]  Jian Xu,et al.  Web Person Disambiguation Using Hierarchical Co-reference Model , 2015, CICLing.