A Collective Approach to Scholar Name Disambiguation

Scholar name disambiguation remains a hard and unsolved problem, which brings various troubles for bibliography data analytics. Most existing methods handle name disambiguation separately that tackles one name at a time, and neglect the fact that disambiguation of one name affects the others. Further, it is typically common that only limited information is available for bibliography data, e.g., only basic paper and citation information is available in DBLP. In this study, we propose a collective approach to name disambiguation, which takes the connection of different ambiguous names into consideration. We reformulate bibliography data as a heterogeneous multipartite network, which initially treats each author reference as a unique author entity, and disambiguation results of one name propagate to the others of the network. To further deal with the sparsity problem caused by limited available information, we also introduce word-word and venue-venue similarities, and we finally measure author similarities by assembling similarities from four perspectives. Using real-life data, we experimentally demonstrate that our approach is both effective and efficient.

[1]  Tobias Backes Effective Unsupervised Author Disambiguation with Relative Frequencies , 2018, JCDL.

[2]  Madian Khabsa,et al.  Large scale author name disambiguation in digital libraries , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[3]  Jianyong Wang,et al.  On Graph-Based Name Disambiguation , 2011, JDIQ.

[4]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[5]  Mohammad Al Hasan,et al.  Name disambiguation from link data in a collaboration graph , 2014, ASONAM.

[6]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[7]  Juan-Zi Li,et al.  A unified framework for name disambiguation , 2008, WWW.

[8]  Jie Tang,et al.  Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. , 2018, KDD.

[9]  Bart Thijs,et al.  Use of ResearchGate and Google CSE for author name disambiguation , 2017, Scientometrics.

[10]  Philip S. Yu,et al.  Object Distinction: Distinguishing Objects with Identical Names , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Zhoujun Li,et al.  Named entity disambiguation for questions in community question answering , 2017, Knowl. Based Syst..

[12]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[13]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[14]  Rui Liu,et al.  Robust Multi-Network Clustering via Joint Cross-Domain Cluster Alignment , 2015, 2015 IEEE International Conference on Data Mining.

[15]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[16]  Marcos André Gonçalves,et al.  A brief survey of automatic methods for author name disambiguation , 2012, SGMD.

[17]  Weiyi Meng,et al.  A Latent Topic Model for Complete Entity Resolution , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[18]  Suhang Wang,et al.  Deep Multi-Graph Clustering via Attentive Cross-Graph Association , 2020, WSDM.

[19]  Chunyan Miao,et al.  Author Name Disambiguation Using a New Categorical Distribution Similarity , 2012, ECML/PKDD.

[20]  Mohammad Al Hasan,et al.  Name Disambiguation in Anonymized Graphs using Network Embedding , 2017, CIKM.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Eneko Agirre,et al.  Alleviating Poor Context with Background Knowledge for Named Entity Disambiguation , 2016, ACL.

[23]  Jonas Kuhn,et al.  Named Entity Disambiguation for little known referents: a topic-based approach , 2016, COLING.

[24]  Philip S. Yu,et al.  ADANA: Active Name Disambiguation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[25]  Shuai Ma,et al.  Improving Spectral Clustering with Deep Embedding and Cluster Estimation , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[26]  Fernando Pereira,et al.  Collective Entity Resolution with Multi-Focal Attention , 2016, ACL.

[27]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[28]  HarzingAnne-Wil Microsoft Academic (Search) , 2016 .

[29]  Murat Dundar,et al.  Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams , 2016, CIKM.

[30]  Chunming Hu,et al.  Athena: A Ranking Enabled Scholarly Search System , 2020, WSDM.

[31]  Shanshan Li,et al.  Deep Collective Classification in Heterogeneous Information Networks , 2018, WWW.

[32]  Luo Si,et al.  Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion , 2013, SIGIR.

[33]  C. Lee Giles,et al.  Disambiguating authors in academic publications using random forests , 2009, JCDL '09.

[34]  Madian Khabsa,et al.  Online Person Name Disambiguation with Constraints , 2015, JCDL.

[35]  Jiawei Han,et al.  A probabilistic model for linking named entities in web text with heterogeneous information networks , 2014, SIGMOD Conference.

[36]  Channamma Patil,et al.  Estimating the Optimal Number of Clusters k in a Dataset Using Data Depth , 2019, Data Science and Engineering.

[37]  Bin Wang,et al.  ELM-based name disambiguation in bibliography , 2013, World Wide Web.

[38]  Chunming Hu,et al.  Query Independent Scholarly Article Ranking , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[39]  Tao Huang,et al.  ANDMC: An Algorithm for Author Name Disambiguation Based on Molecular Cross Clustering , 2019, DASFAA Workshops.

[40]  Yang Song,et al.  Efficient topic-based unsupervised name disambiguation , 2007, JCDL '07.