Does dirty data affect google scholar citations?

Google Scholar (GS) is a database that enables researchers to create their scholarly profiles and keeps track of, among others, their citation counts, and h‐ and i10‐index values. GS is now increasingly being used for research evaluation purposes. Although rich in bibliometric data, GS indexes some duplicate publications and citations, and therefore tends to inflate the citation counts to some extent. Based on a small sample of GS profiles of researchers, this paper aims to study the extent by which duplicates change the citation counts and metrics based thereupon. Findings show that duplicates in GS database somewhat inflates the citation metrics. The scale of the problem as well as the effect of dirty data on performance evaluations based on GS citations data need to be studied further using larger samples.

[1]  Ronald Rousseau,et al.  The h-bubble , 2013, J. Informetrics.

[2]  Declan Butler,et al.  Science searches shift up a gear as Google starts Scholar engine , 2004, Nature.

[3]  Rosalyn Stewart,et al.  Objectives , 1954, 2021 23rd International Conference on Advanced Communication Technology (ICACT).

[4]  Rachel Kettle,et al.  Identifying evidence for public health guidance: a comparison of citation searching with Web of Science and Google Scholar , 2016, Research synthesis methods.

[5]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[6]  Nicolás Robinson-García,et al.  The Google scholar experiment: How to index false papers and manipulate bibliometric indicators , 2013, J. Assoc. Inf. Sci. Technol..

[7]  Péter Jacsó,et al.  Google Scholar Author Citation Tracker: is it too little, too late? , 2012 .

[8]  Anne-Wil Harzing,et al.  Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison , 2015, Scientometrics.

[9]  M. Mackiewicz-Talarczyk,et al.  Bibliometric Analysis of Publishing Trends in Fiber Crops in Google Scholar, Scopus, and Web of Science , 2015 .

[10]  Enrique Orduña-Malea,et al.  The counting house: measuring those who count. Presence of Bibliometrics, Scientometrics, Informetrics, Webometrics and Altmetrics in the Google Scholar Citations, ResearcherID, ResearchGate, Mendeley & Twitter , 2016, ArXiv.

[11]  Citations , 2002 .

[12]  Saeed-Ul Hassan,et al.  Altmetrics of "altmetrics" using Google Scholar, Twitter, Mendeley, Facebook, Google-plus, CiteULike, Blogs and Wiki , 2016, ArXiv.

[13]  Anne-Wil Harzing,et al.  Google Scholar as a new source for citation analysis , 2008 .

[14]  Péter Jacsó,et al.  Dubious hit counts and cuckoo's eggs , 2006, Online Inf. Rev..

[15]  Henk F. Moed,et al.  A new methodology for comparing Google Scholar and Scopus , 2015, J. Informetrics.

[16]  Péter Jacsó,et al.  Deflated, inflated and phantom citation counts , 2006, Online Inf. Rev..