Author name disambiguation: What difference does it make in author-based citation analysis?

In this article, we explore how strongly author name disambiguation (AND) affects the results of an author-based citation analysis study, and identify conditions under which the traditional simplified approach of using surnames and first initials may suffice in practice. We compare author citation ranking and cocitation mapping results in the stem cell research field from 2004 to 2009 using two AND approaches: the traditional simplified approach of using author surname and first initial and a sophisticated algorithmic approach. We find that the traditional approach leads to extremely distorted rankings and substantially distorted mappings of authors in this field when based on first- or all-author citation counting, whereas last-author-based citation ranking and cocitation mapping both appear relatively immune to the author name ambiguity problem. This is largely because Romanized names of Chinese and Korean authors, who are very active in this field, are extremely ambiguous, but few of these researchers consistently publish as last authors in bylines. We conclude that a more earnest effort is required to deal with the author name ambiguity problem in both citation analysis and information retrieval, especially given the current trend toward globalization. In the stem cell research field, in which laboratory heads are traditionally listed as last authors in bylines, last-author-based citation ranking and cocitation mapping using the traditional approach to author name disambiguation may serve as a simple workaround, but likely at the price of largely filtering out Chinese and Korean contributions to the field as well as important contributions by young researchers. © 2012 Wiley Periodicals, Inc.

[1]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[2]  Andreas Strotmann,et al.  Evolution of research activities and intellectual influences in information science 1996-2005: Introducing author bibliographic-coupling analysis , 2008, J. Assoc. Inf. Sci. Technol..

[3]  Hui Han,et al.  Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[4]  Henk F. Moed,et al.  Citation Analysis in Research Evaluation , 1899 .

[5]  Dangzhi Zhao Combining commercial and open access citation databases to delimit highly interdisciplinary research fields for citation analysis , 2014 .

[6]  H. Moed Citation Analysis in Research Evaluation (Information Science & Knowledge Management) , 2005 .

[7]  Andreas Strotmann,et al.  Counting first, last, or all authors in citation analysis: A comprehensive comparison in the highly collaborative stem cell research field , 2011, J. Assoc. Inf. Sci. Technol..

[8]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation , 2005, J. Assoc. Inf. Sci. Technol..

[9]  Katherine W. McCain,et al.  Visualizing a Discipline: An Author Co-Citation Analysis of Information Science, 1972-1995 , 1998, J. Am. Soc. Inf. Sci..

[10]  R. Simonsen Credit where credit is due , 1995, Nature Cell Biology.

[11]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation: Research Articles , 2005 .

[12]  Marcos André Gonçalves,et al.  An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations , 2010, J. Assoc. Inf. Sci. Technol..

[13]  Neil R. Smalheiser,et al.  Author name disambiguation , 2009, Annu. Rev. Inf. Sci. Technol..

[14]  Won-Kyung Sung,et al.  On co-authorship for author disambiguation , 2009, Inf. Process. Manag..

[15]  Andreas Strotmann,et al.  Commercialization and collaboration: competing policies in publicly funded stem cell research? , 2010, Cell stem cell.

[16]  Neil R. Smalheiser,et al.  Author name disambiguation in MEDLINE , 2009, TKDD.

[17]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[18]  Jane Qiu,et al.  Scientific publishing: Identity crisis , 2008, Nature.

[19]  Andreas Strotmann,et al.  Combining commercial citation indexes and open-access bibliographic databases to delimit highly interdisciplinary research fields for citation analysis , 2010, J. Informetrics.

[20]  Andreas Strotmann,et al.  Author name disambiguation for collaboration network analysis and visualization , 2009, ASIST.

[21]  Elmer V. Bernstam,et al.  A day in the life of PubMed: analysis of a typical day's query log. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[22]  Y. Liu,et al.  Credit Where Credit Is Due , 2001 .

[23]  Andreas Strotmann,et al.  Intellectual structure of stem cell research: a comprehensive author co-citation analysis of a highly collaborative and multidisciplinary field , 2011, Scientometrics.

[24]  J. Zhang,et al.  The organization of scientists and its relation to scientific productivity: Perceptions of Chinese stem cell researchers , 2010, BioSocieties.