A Multi-Level Author Name Disambiguation Algorithm

With the rapid development of information technology, the name ambiguity problem has become one of the primary issues in the fields of information retrieval, data mining, and scientific measurement. Name disambiguation is used to promote computer technology and big data information, which maps virtual relational networks to real social networks to solve the problem that the same name points to multiple entities. At present many literature search platforms launched their respective scholar system, name ambiguity problem will inevitably affect the precision of other information calculations, reduce the credibility of the system, and affect the information quality and content quality. Most work deals with this issue by using graph theory and clustering. However, the name disambiguation problem is still not well resolved. In this paper, we propose a multi-level name disambiguation algorithm. This algorithm is mainly based on the unsupervised algorithm, which combines hierarchical agglomerative clustering (HAC) and graph theory for disambiguating. The experimental results show that the proposed solution achieves clearly better performance (+17 ~ 25% in terms of F1-Measure) than several methods, including HAC and Graph.

[1]  Jianyong Wang,et al.  On Graph-Based Name Disambiguation , 2011, JDIQ.

[2]  Jie Tang,et al.  Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop. , 2018, KDD.

[3]  Gilles Louppe,et al.  Ethnicity Sensitive Author Disambiguation Using Semi-supervised Learning , 2015, KESW.

[4]  Marcos André Gonçalves,et al.  A brief survey of automatic methods for author name disambiguation , 2012, SGMD.

[5]  C. Lee Giles,et al.  Efficient Name Disambiguation for Large-Scale Databases , 2006, PKDD.

[6]  Bradley Malin,et al.  Unsupervised Name Disambiguation via Social Network Similarity , 2005 .

[7]  Ted Pedersen,et al.  Name Discrimination by Clustering Similar Contexts , 2005, CICLing.

[8]  Jaroslaw Protasiewicz,et al.  A hybrid knowledge-based framework for author name disambiguation , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[9]  Nigel Shadbolt,et al.  Also by the same author: AKTiveAuthor, a citation graph approach to name disambiguation , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[10]  Monte D. Evans A New Approach to Journal and Conference Name Disambiguation through K-Means Clustering of Internet and Document Surrogates , 2009 .

[11]  Cheng Li,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[12]  Tao Huang,et al.  ANDMC: An Algorithm for Author Name Disambiguation Based on Molecular Cross Clustering , 2019, DASFAA Workshops.

[13]  Dan Roth,et al.  Identification and Tracing of Ambiguous Names: Discriminative and Generative Approaches , 2004, AAAI.

[14]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[15]  Jacek M. Leski,et al.  Hierarchical Agglomerative Clustering of Time-Warped Series , 2017, ICMMI.

[16]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[17]  Joongmin Choi,et al.  Automatic Method for Author Name Disambiguation Using Social Networks , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[18]  EUGENE GARFIELD British Quest for Uniqueness versus American Egocentrism , 1969, Nature.

[19]  Won-Kyung Sung,et al.  On co-authorship for author disambiguation , 2009, Inf. Process. Manag..