Incremental author name disambiguation by exploiting domain‐specific heuristics

The vast majority of the current author name disambiguation solutions are designed to disambiguate a whole digital library (DL) at once considering the entire repository. However, these solutions besides being very expensive and having scalability problems, also may not benefit from eventual manual corrections, as they may be lost whenever the process of disambiguating the entire repository is required. In the real world, in which repositories are updated on a daily basis, incremental solutions that disambiguate only the newly introduced citation records, are likely to produce improved results in the long run. However, the problem of incremental author name disambiguation has been largely neglected in the literature. In this article we present a new author name disambiguation method, specially designed for the incremental scenario. In our experiments, our new method largely outperforms recent incremental proposals reported in the literature as well as the current state‐of‐the‐art non‐incremental method.

[1]  Jussara M. Almeida,et al.  A tool for generating synthetic authorship records for evaluating author name disambiguation methods , 2012, Inf. Sci..

[2]  HeJun,et al.  Unsupervised author disambiguation using Dempster---Shafer theory , 2014 .

[3]  Marcos André Gonçalves,et al.  Incremental Unsupervised Name Disambiguation in Cleaned Digital Libraries , 2011, J. Inf. Data Manag..

[4]  Wagner Meira,et al.  Cost-effective on-demand associative author name disambiguation , 2012, Inf. Process. Manag..

[5]  Kei Kurakawa,et al.  Researcher Name Resolver: identifier management system for Japanese researchers , 2014, International Journal on Digital Libraries.

[6]  Adriano Veloso,et al.  Active associative sampling for author name disambiguation , 2012, JCDL '12.

[7]  Jianyong Wang,et al.  On Graph-Based Name Disambiguation , 2011, JDIQ.

[8]  Wanli Liu,et al.  Author Name Disambiguation for PubMed , 2013, J. Assoc. Inf. Sci. Technol..

[9]  Marcos André Gonçalves,et al.  On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method , 2015, International Journal on Digital Libraries.

[10]  Hui Han,et al.  Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[11]  Neil R. Smalheiser,et al.  Author name disambiguation in MEDLINE , 2009, TKDD.

[12]  C. Lee Giles,et al.  Disambiguating authors in academic publications using random forests , 2009, JCDL '09.

[13]  Marcos André Gonçalves,et al.  An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations , 2010 .

[14]  Adriano Veloso,et al.  Effective self-training author name disambiguation in scholarly digital libraries , 2010, JCDL '10.

[15]  Seungwoo Lee,et al.  Construction of a large-scale test set for author disambiguation , 2011, Inf. Process. Manag..

[16]  Wei Xu,et al.  A hierarchical naive Bayes mixture model for name disambiguation in author citations , 2005, SAC '05.

[17]  伊藤 本気,et al.  Collective Entity Resolutionの適用による音声認識された議事録間の関係抽出 , 2015 .

[18]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[19]  Marcos André Gonçalves,et al.  A brief survey of automatic methods for author name disambiguation , 2012, SGMD.

[20]  Marcos André Gonçalves,et al.  An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations , 2010, J. Assoc. Inf. Sci. Technol..

[21]  C. Lee Giles,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[22]  Byung-Won On,et al.  Comparative study of name disambiguation problem using a scalable blocking-based framework , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[23]  C. Lee Giles,et al.  Efficient Name Disambiguation for Large-Scale Databases , 2006, PKDD.

[24]  Hao Wu,et al.  Unsupervised author disambiguation using Dempster–Shafer theory , 2014, Scientometrics.

[25]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[26]  Divesh Srivastava,et al.  Incremental Record Linkage , 2014, Proc. VLDB Endow..

[27]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[28]  David Menotti,et al.  Reducing Fragmentation in Incremental Author Name Disambiguation , 2014, J. Inf. Data Manag..

[29]  Alberto H. F. Laender,et al.  BDBComp: building a digital library for the Brazilian computer science community , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..