Google Scholar as a source for scholarly evaluation: A bibliographic review of database errors

Google Scholar (GS) is an academic search engine and discovery tool launched by Google (now Alphabet) in November 2004. The fact that GS provides the number of citations received by each article from all other indexed articles (regardless of their source) has led to its use in bibliometric analysis and academic assessment tasks, especially in social sciences and humanities. However, the existence of errors, sometimes of great magnitude, has provoked criticism from the academic community. The aim of this article is to carry out an exhaustive bibliographical review of all studies that provide either specific or incidental empirical evidence of the errors found in Google Scholar. The results indicate that the bibliographic corpus dedicated to errors in Google Scholar is still very limited (n= 49), excessively fragmented, and diffuse; the findings have not been based on any systematic methodology or on units that are comparable to each other, so they cannot be quantified, or their impact analysed, with any precision. Certain limitations of the search engine itself (time required for data cleaning, limit on citations per search result and hits per query) may be the cause of this absence of empirical studies.

[1]  Rafael Ruiz-Pérez,et al.  Google Scholar como herramienta para la evaluación científica. , 2009 .

[2]  Emilio Delgado López-Cózar,et al.  A two-sided academic landscape : snapshot of highly-cited documents in Google Scholar ( 1950-2013 ) , 2016 .

[3]  Susan Gardner,et al.  Gaga over Google? Scholar in the Social Sciences , 2005 .

[4]  at Lse,et al.  ‘Maximizing The Impacts Of Your Research: A Handbook For Social Scientists’ now available to download as a PDF , 2011 .

[5]  Judit Bar-Ilan,et al.  Citations to the “Introduction to informetrics” indexed by WOS, Scopus and Google Scholar , 2010, Scientometrics.

[6]  A. Baneyx,et al.  “Publish or Perish” as citation metrics used to analyze scientific output in the humanities: International case studies in economics, geography, social sciences, philosophy, and history , 2008, Archivum Immunologiae et Therapiae Experimentalis.

[7]  Juan Carlos Marcos Recio La revolución Google Scholar. Destapando la caja de Pandora académica , 2017 .

[8]  Mary Shultz,et al.  Comparing test searches in PubMed and Google Scholar. , 2007, Journal of the Medical Library Association : JMLA.

[9]  R. Greg,et al.  Notess. Scholarly Web Searching : Google Scholar and Scirus , 2005 .

[10]  Hannah Rozear Where Google Scholar stands on art: an evaluation of content coverage in online databases , 2009, Art Libraries Journal.

[11]  Linda Butler The devil is in the detail: Concerns about Vanclay's analysis of Australian journal rankings , 2011, J. Informetrics.

[12]  Jim Giles,et al.  Science in the web age: Start your engines , 2005, Nature.

[13]  Jeffrey M. Perkel The future of citation analysis , 2005 .

[14]  Enrique Orduña-Malea,et al.  Does Google Scholar contain all highly cited documents (1950-2013)? , 2014, ArXiv.

[15]  Stephen J. Bensman The impact factor: its place in Garfield’s thought, in science evaluation, and in library collection management , 2011, Scientometrics.

[16]  Péter Jacsó,et al.  Metadata mega mess in Google Scholar , 2010, Online Inf. Rev..

[17]  Richard K. Belew,et al.  Scientific impact quantity and quality: Analysis of two sources of bibliographic data , 2005, ArXiv.

[18]  Chris Rensleigh,et al.  Web of Science, Scopus and Google Scholar: A content comprehensiveness comparison , 2013, Electron. Libr..

[19]  Péter Jacsó,et al.  Using Google Scholar for journal impact factors and the h‐index in nationwide publishing assessments in academia – siren songs and air‐raid sirens , 2012 .

[20]  Jöran Beel,et al.  Google Scholar’s Ranking Algorithm : An Introductory Overview , 2009 .

[21]  Anne-Wil Harzing,et al.  A longitudinal study of Google Scholar coverage between 2012 and 2013 , 2013, Scientometrics.

[22]  F. W. Lancaster,et al.  Testing the Calculation of a Realistic h-index in Google Scholar, Scopus, and Web of Science for , 2008 .

[23]  Jiang Li,et al.  Ranking of library and information science researchers: Comparison of data sources for correlating citation data, and expert judgments , 2010, J. Informetrics.

[24]  Péter Jacsó,et al.  Google Scholar revisited , 2008, Online Inf. Rev..

[25]  Péter Jacsó,et al.  Google Scholar's Ghost Authors. , 2009 .

[26]  Nisa Bakkalbasi,et al.  An Examination of Citation Counts in a New Scholarly Communication Environment , 2005, D Lib Mag..

[27]  Emilio Delgado López-Cózar,et al.  Can we use Google Scholar to identify highly-cited documents? , 2017, J. Informetrics.

[28]  Jonas lucio maiaa,et al.  BIBlIOmeTRIC ReseaRCh ON sTRaTegy as pRaCTICe: explORaTORy ResUlTs aND sOURCe COmpaRIsON , 2016 .

[29]  Debora Shaw,et al.  A new look at evidence of scholarly citation in citation indexes and from web sources , 2008, Scientometrics.

[30]  Mike Thelwall,et al.  ResearchGate versus Google Scholar: Which finds more early citations? , 2017, Scientometrics.

[31]  Michael Levine-Clark,et al.  A Comparative Citation Analysis of Web of Science, Scopus, and Google Scholar , 2008 .

[32]  Enrique Orduña-Malea,et al.  Google Scholar Metrics 2014: a low cost bibliometric tool , 2014, ArXiv.

[33]  Alireza Noruzi Google Scholar: The New Generation of Citation Indexes , 2005 .

[34]  Jeroen Bosman,et al.  Scopus reviewed and compared: the coverage and functionality of the citation database Scopus, including comparisons with Web of Science and Google Scholar , 2006 .

[35]  Péter Jacsó,et al.  Academic Search Engines: A Quantitative Outlook , 2015, Online Inf. Rev..

[36]  José Luis Ortega,et al.  Relationship between altmetric and bibliometric indicators across academic social sites: The case of CSIC's members , 2015, J. Informetrics.

[37]  Yasar Tonta,et al.  Does dirty data affect google scholar citations? , 2016, ASIST.

[38]  Judit Bar-Ilan,et al.  Which h-index? — A comparison of WoS, Scopus and Google Scholar , 2008, Scientometrics.

[39]  Anne-Wil Harzing,et al.  Google Scholar as a new source for citation analysis , 2008 .

[40]  Henk F. Moed,et al.  A new methodology for comparing Google Scholar and Scopus , 2015, J. Informetrics.

[41]  Péter Jacsó,et al.  Deflated, inflated and phantom citation counts , 2006, Online Inf. Rev..

[42]  Péter Jacsó,et al.  Dubious hit counts and cuckoo's eggs , 2006, Online Inf. Rev..

[43]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[44]  William H. Walters,et al.  Google Scholar coverage of a multidisciplinary field , 2007, Inf. Process. Manag..

[45]  Miguel A. García-Pérez,et al.  Accuracy and completeness of publication and citation records in the Web of Science, PsycINFO, and Google Scholar: A case study for the computation of h indices in Psychology , 2010, J. Assoc. Inf. Sci. Technol..

[46]  Joann M. Wleklinski,et al.  Studying google scholar: Wall to wall coverage? , 2005 .

[47]  P. Jacsó As we may search : Comparison of major features of the Web of Science, Scopus, and Google Scholar citation-based and citation-enhanced databases , 2005 .

[48]  Bruce Walcott,et al.  Big news , 2005, IEEE Potentials.

[49]  Enrique Orduña-Malea,et al.  The counting house: measuring those who count. Presence of Bibliometrics, Scientometrics, Informetrics, Webometrics and Altmetrics in the Google Scholar Citations, ResearcherID, ResearchGate, Mendeley & Twitter , 2016, ArXiv.

[50]  Sandra L. De Groote,et al.  Google Scholar versions: do more versions of an article mean greater impact? , 2014, Libr. Hi Tech.

[51]  Lokman I. Meho,et al.  Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar , 2007, J. Assoc. Inf. Sci. Technol..

[52]  Péter Jacsó Grim tales about the impact factor and the h-index in the Web of Science and the Journal Citation Reports databases: reflections on Vanclay’s criticism , 2012, Scientometrics.

[53]  Ben Wooliscroft,et al.  Measuring the impact of accounting journals using Google Scholar and the g-index , 2009 .

[54]  Daniel Pauly,et al.  Equivalence of results from two citation analyses: Thomson ISI's Citation Index and Google's Scholar service , 2005 .

[55]  Neal R Haddaway,et al.  The Role of Google Scholar in Evidence Reviews and Its Applicability to Grey Literature Searching , 2015, PloS one.

[56]  Bruce White Examining the claims of Google Scholar as a serious information source , 2006 .

[57]  Judit Bar-Ilan,et al.  An ego-centric citation analysis of the works of Michael O. Rabin based on multiple citation indexes , 2006, Inf. Process. Manag..

[58]  Péter Jacsó,et al.  Calculating the h-index and other bibliometric and scientometric indicators from Google Scholar with the Publish or Perish software , 2009, Online Inf. Rev..

[59]  Chris Rensleigh,et al.  Content versus quality : a Web of Science, Scopus and Google Scholar comparison , 2011 .

[60]  Lokman I. Meho,et al.  Citation Analysis: A Comparison of Google Scholar, Scopus, and Web of Science , 2007, Proceedings of the American Society for Information Science and Technology.

[61]  Enrique Orduña-Malea,et al.  The lost academic home: institutional affiliation links in Google Scholar Citations , 2017, Online Inf. Rev..

[62]  Mark Sanderson,et al.  Revisiting h measured on UK LIS and IR academics , 2008, J. Assoc. Inf. Sci. Technol..

[63]  Péter Jacsó,et al.  Google Scholar Author Citation Tracker: is it too little, too late? , 2012 .

[64]  Péter Jacsó,et al.  Google Scholar duped and deduped – the aura of “robometrics” , 2011 .

[65]  Anne-Wil Harzing,et al.  Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison , 2015, Scientometrics.

[66]  Andreas Thor,et al.  The calculation of the single publication h index and related performances measures: A Web application based on Google Scholar data , 2011, Online Inf. Rev..

[67]  Judith Wusteman,et al.  Putting Google Scholar to the test: a preliminary study , 2007, Program.

[68]  Declan Butler,et al.  Science searches shift up a gear as Google starts Scholar engine , 2004, Nature.

[69]  Péter Jacsó,et al.  Comparison and Analysis of the Citedness Scores in Web of Science and Google Scholar , 2005, ICADL.

[70]  Alexander Dilger,et al.  A citation-based ranking of German-speaking researchers in business administration with data of Google Scholar , 2013 .

[71]  Isidro F. Aguillo Is Google Scholar useful for bibliometrics? A webometric analysis , 2012, Scientometrics.

[72]  Jerome K. Vanclay,et al.  Impact factor: outdated artefact or stepping-stone to journal certification? , 2011, Scientometrics.

[73]  Joost C. F. de Winter,et al.  The expansion of Google Scholar versus Web of Science: a longitudinal study , 2013, Scientometrics.

[74]  Péter Jacsó,et al.  The pros and cons of computing the h-index using Google Scholar , 2008, Online Inf. Rev..