A Novel Approach for Author Name Disambiguation Using Ranking Confidence

In digital libraries, ambiguous author names may occur because of the existence of multiple authors with the same name or different name variations for the same person. In recent years, name disambiguation has become a major challenge when integrating data from multiple sources in bibliographic digital libraries. Most of the previous works solve this issue by using many attributes, such as coauthors, title of articles/publications, topics of articles, and years of publications. However, in most cases, we can only get the coauthor and title attributes. In this paper, we propose an approach which is based on Hierarchical Agglomerative Clustering (HAC) and only use the coauthor and title attributes, but can more effectively identify the disambiguation authors. The whole algorithm can divide into two stages. In the first stage, we employ a pair-wise grouping algorithm which is based on coauthors’name to group records into clusters. Then, we merge two clusters if the similarity of the article titles from two clusters reach the threshold. Here, we use three kinds of similarity algorithms such as Jaccard Similarity, Cosine Similarity and Euclidean Distance to compare the similarity between the titles of two clusters. To minimize the risk of using only one similarity metric, we design the concept of ranking confidence to measure the confidence of different similarity meausrements. The ranking confidence decides which similarity measure to use when merging clusters. In the experiments, we use PairPresicion, PairRecall and PairF1 score to evaluate our method and compare with other methods. Experimental results indicate that our method significantly outperforms the baseline methods: HAC, K-means and SACluster when only use coauthor and title attributes.

[1]  Dongwon Lee,et al.  Search engine driven author disambiguation , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  C. Lee Giles,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[4]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[5]  Rashid Ali,et al.  Author name disambiguation using vector space model and hybrid similarity measures , 2014, 2014 Seventh International Conference on Contemporary Computing (IC3).

[6]  Xiaofang Zhou,et al.  A Term-Based Driven Clustering Approach for Name Disambiguation , 2009, APWeb/WAIM.

[7]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[8]  Hui Han,et al.  Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[9]  Edward J. Dudewicz,et al.  Complete Ranking of Reliability-Related Distributions , 1977, IEEE Transactions on Reliability.

[10]  Marcos André Gonçalves,et al.  An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations , 2010 .

[11]  Philip S. Yu,et al.  Object Distinction: Distinguishing Objects with Identical Names , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Xiaofang Zhou,et al.  Anddy: A System for Author Name Disambiguation in Digital Library , 2010, DASFAA.

[13]  Luo Si,et al.  Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion , 2013, SIGIR.

[14]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[15]  Byung-Won On,et al.  Meta similarity , 2011, Applied Intelligence.

[16]  Mohammad Hossein Nadimi,et al.  A more Accurate Clustering Method by using Co-author Social Networks for Author Name Disambiguation , 2015 .

[17]  Jan-Ming Ho,et al.  Author Name Disambiguation for Citations Using Topic and Web Correlation , 2008, ECDL.

[18]  Chunyan Miao,et al.  Author Name Disambiguation Using a New Categorical Distribution Similarity , 2012, ECML/PKDD.

[19]  Yuhua Li,et al.  Disambiguating Authors by Pairwise Classification , 2010 .

[20]  Jia Zhu,et al.  Efficient Name Disambiguation in Digital Libraries , 2011, WAIM.

[21]  Byung-Won On,et al.  Social Network Analysis on Name Disambiguation and More , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[22]  Won-Kyung Sung,et al.  On co-authorship for author disambiguation , 2009, Inf. Process. Manag..