On Disambiguating Authors: Collaboration Network Reconstruction in a Bottom-up Manner

Author disambiguation arises when different authors share the same name, which is a critical task in digital libraries, such as DBLP, CiteULike, CiteSeerX, etc. While the state-of-the-art methods have developed various paper embedding-based methods performing in a top-down manner, they primarily focus on the ego-network of a target name and overlook the low-quality collaborative relations existed in the ego-network. Thus, these methods can be suboptimal for disambiguating authors. In this paper, we model the author disambiguation as a collaboration network reconstruction problem, and propose an incremental and unsupervised author disambiguation method, namely IUAD, which performs in a bottom-up manner. Initially, we build a stable collaboration network based on stable collaborative relations. To further improve the recall, we build a probabilistic generative model to reconstruct the complete collaboration network. In addition, for newly published papers, we can incrementally judge who publish them via only computing the posterior probabilities. We have conducted extensive experiments on a large-scale DBLP dataset to evaluate IUAD. The experimental results demonstrate that IUAD not only achieves the promising performance, but also outperforms comparable baselines significantly. Codes are available at https://github.com/papergitgit/IUAD.

[1]  Soto Montalvo,et al.  A Data Driven Approach for Person Name Disambiguation in Web Search Results , 2014, COLING.

[2]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[3]  Philip S. Yu,et al.  PCT: Partial Co-Alignment of Social Networks , 2016, WWW.

[4]  Dirk Helbing,et al.  Exploiting citation networks for large-scale author name disambiguation , 2014, EPJ Data Science.

[5]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[6]  Gerhard Weikum,et al.  diaNED: Time-Aware Named Entity Disambiguation for Diachronic Corpora , 2018, ACL.

[7]  Jun Xu,et al.  A Network-embedding Based Method for Author Disambiguation , 2018, CIKM.

[8]  C. Lee Giles,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[9]  Yang Song,et al.  Efficient topic-based unsupervised name disambiguation , 2007, JCDL '07.

[10]  Li Liu,et al.  Aligning Users across Social Networks Using Network Embedding , 2016, IJCAI.

[11]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[12]  Zhongmin Yan,et al.  Author Name Disambiguation Using Graph Node Embedding Method , 2019, 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[13]  Luciano da Fontoura Costa,et al.  Topological-collaborative approach for disambiguating authors’ names in collaborative networks , 2013, Scientometrics.

[14]  C. Lee Giles,et al.  Hybrid Deep Pairwise Classification for Author Name Disambiguation , 2019, CIKM.

[15]  C. Lee Giles,et al.  Disambiguating authors in academic publications using random forests , 2009, JCDL '09.

[16]  Rong Pan,et al.  Mention and Entity Description Co-Attention for Entity Disambiguation , 2018, AAAI.

[17]  Ming Gao,et al.  CNL: Collective Network Linkage Across Heterogeneous Social Platforms , 2015, 2015 IEEE International Conference on Data Mining.

[18]  Joydeep Chandra,et al.  A Graph Combination With Edge Pruning‐Based Approach for Author Name Disambiguation , 2020, J. Assoc. Inf. Sci. Technol..

[19]  Leonardo Neves,et al.  Multimodal Named Entity Disambiguation for Noisy Social Media Posts , 2018, ACL.

[20]  Lise Getoor,et al.  FutureRank: Ranking Scientific Articles by Predicting their Future PageRank , 2009, SDM.

[21]  Jun Xu,et al.  Author Disambiguation through Adversarial Network Representation Learning , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[22]  Juan-Zi Li,et al.  Name Disambiguation Using Atomic Clusters , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[23]  Devdatt P. Dubhashi,et al.  Entity disambiguation in anonymized graphs using graph kernels , 2013, CIKM.

[24]  Robin I. M. Dunbar Neocortex size as a constraint on group size in primates , 1992 .

[25]  Sudipta Sengupta,et al.  Online Deduplication for Databases , 2017, SIGMOD Conference.

[26]  Julien Ah-Pine Normalized Kernels as Similarity Indices , 2010, PAKDD.

[27]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[28]  Jie Tang,et al.  Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. , 2018, KDD.

[29]  Weijia Li,et al.  A fast method based on multiple clustering for name disambiguation in bibliographic citations , 2015, J. Assoc. Inf. Sci. Technol..

[30]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[31]  Craig A. Knoblock,et al.  Learning object identification rules for information integration , 2001, Inf. Syst..

[32]  Mohammad Al Hasan,et al.  Name Disambiguation in Anonymized Graphs using Network Embedding , 2017, CIKM.

[33]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[34]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[35]  Xin Li,et al.  Constraint-Based Entity Matching , 2005, AAAI.

[36]  Zhoujun Li,et al.  Adversarial Learning for Weakly-Supervised Social Network Alignment , 2019, AAAI.

[37]  Taehwan Kim,et al.  Author name disambiguation using a graph model with node splitting and merging based on bibliographic information , 2014, Scientometrics.

[38]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[39]  Jianyong Wang,et al.  On Graph-Based Name Disambiguation , 2011, JDIQ.

[40]  Satoshi Oyama,et al.  A Deep Neural Network for Pairwise Classification: Enabling Feature Conjunctions and Ensuring Symmetry , 2017, PAKDD.

[41]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws , 2008, 2008 Eighth IEEE International Conference on Data Mining.