Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases

Information on the size of academic search engines and bibliographic databases (ASEBDs) is often outdated or entirely unavailable. Hence, it is difficult to assess the scope of specific databases, such as Google Scholar. While scientometric studies have estimated ASEBD sizes before, the methods employed were able to compare only a few databases. Consequently, there is no up-to-date comparative information on the sizes of popular ASEBDs. This study aims to fill this blind spot by providing a comparative picture of 12 of the most commonly used ASEBDs. In doing so, we build on and refine previous scientometric research by counting query hit data as an indicator of the number of accessible records. Iterative query optimization makes it possible to identify a maximum number of hits for most ASEBDs. The results were validated in terms of their capacity to assess database size by comparing them with official information on database sizes or previous scientometric studies. The queries used here are replicable, so size information can be updated quickly. The findings provide first-time size estimates of ProQuest and EbscoHost and indicate that Google Scholar’s size might have been underestimated so far by more than 50%. By our estimation Google Scholar, with 389 million records, is currently the most comprehensive academic search engine.

[1]  Mike Thelwall,et al.  Search markets and search results: The case of Bing , 2013 .

[2]  Emilio Delgado López-Cózar,et al.  Google Scholar as a source for scholarly evaluation: A bibliographic review of database errors , 2017 .

[3]  C. Lee Giles,et al.  Accessibility of information on the Web , 2000, INTL.

[4]  Mary Shultz,et al.  Comparing test searches in PubMed and Google Scholar. , 2007, Journal of the Medical Library Association : JMLA.

[5]  Steve Mirsky,et al.  About the Size of It. , 2002 .

[6]  John Mingers,et al.  Normalizing Google Scholar data for use in research evaluation , 2017, Scientometrics.

[7]  Enrique Orduña-Malea,et al.  Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison , 2018, Scientometrics.

[8]  Péter Jacsó,et al.  Using Google Scholar for journal impact factors and the h‐index in nationwide publishing assessments in academia – siren songs and air‐raid sirens , 2012 .

[9]  Peter Bailey,et al.  Measuring Search Engine Quality , 2001, Information Retrieval.

[10]  Concepción S. Wilson,et al.  The Literature of Bibliometrics, Scientometrics, and Informetrics , 2001, Scientometrics.

[11]  Rupesh K. Kesharwani,et al.  Syngeneic Cardiac and Bone Marrow Stromal Cells Display Tissue-Specific microRNA Signatures and microRNA Subsets Restricted to Diverse Differentiation Processes , 2013, PloS one.

[12]  Enrique Orduña-Malea,et al.  Does Google Scholar contain all highly cited documents (1950-2013)? , 2014, ArXiv.

[13]  Thed N. van Leeuwen,et al.  Using Google Scholar in research evaluation of humanities and social science programs: A comparison with Web of Science data , 2016 .

[14]  Philipp Mayr,et al.  An exploratory study of Google Scholar , 2007, Online Inf. Rev..

[15]  Rafael Aleixandre-Benavent,et al.  A systematic analysis of duplicate records in Scopus , 2015, J. Informetrics.

[16]  M. HamidR.Jamali,et al.  Google and the scholar: the role of Google in scientists' information-seeking behaviour , 2010, Online Inf. Rev..

[17]  Andrew D. Asher,et al.  College Libraries and Student Culture: What We Now Know , 2011 .

[18]  Lutz Bornmann,et al.  Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references , 2014, J. Assoc. Inf. Sci. Technol..

[19]  Masood Fooladi,et al.  A Comparison between Two Main Academic Literature Collections: Web of Science and Scopus Databases , 2013, ArXiv.

[20]  Mike Thelwall,et al.  Introduction to Webometrics: Quantitative Web Research for the Social Sciences , 2009, Introduction to Webometrics.

[21]  Joost C. F. de Winter,et al.  The expansion of Google Scholar versus Web of Science: a longitudinal study , 2013, Scientometrics.

[22]  Henk F. Moed,et al.  Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation - Review of the Literature , 2017, J. Informetrics.

[23]  Cornelia Caragea,et al.  CiteSeer x : A Scholarly Big Dataset , 2014, ECIR.

[24]  Madian Khabsa,et al.  Digital commons , 2020, Internet Policy Rev..

[25]  Amanda Spink,et al.  An Analysis of Web Documents Retrieved and Viewed , 2003, International Conference on Internet Computing.

[26]  Jeanene Light,et al.  Information-seeking behavior of basic science researchers: implications for library services. , 2010, Journal of the Medical Library Association : JMLA.

[27]  Enrique Orduña-Malea,et al.  Methods for estimating the size of Google Scholar , 2014, Scientometrics.

[28]  Anne-Wil Harzing,et al.  A longitudinal study of Google Scholar coverage between 2012 and 2013 , 2013, Scientometrics.

[29]  Mansaf Alam,et al.  A survey on scholarly data: From big data perspective , 2017, Inf. Process. Manag..

[30]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[31]  José Luis Ortega Academic search engines : a quantitative outlook , 2014 .

[32]  Judit Bar-Ilan,et al.  Informetrics at the beginning of the 21st century - A review , 2008, J. Informetrics.

[33]  Enrique Orduña-Malea,et al.  Google Scholar as a data source for research assessment , 2018, Springer Handbook of Science and Technology Indicators.

[34]  Bernard J. Jansen,et al.  Coverage, relevance, and ranking: The impact of query operators on Web search engine results , 2003, TOIS.

[35]  Enrique Orduña-Malea,et al.  Empirical Evidences in Citation-Based Search Engines: Is Microsoft Academic Search dead? , 2014, Online Inf. Rev..

[36]  Isidro F. Aguillo Is Google Scholar useful for bibliometrics? A webometric analysis , 2012, Scientometrics.

[37]  Mike Thelwall,et al.  Bibliometrics to webometrics , 2008, J. Inf. Sci..

[38]  Richard Van Noorden Online collaboration: Scientists and the social network , 2014, Nature.

[39]  Enrique Orduña-Malea,et al.  Google Scholar Metrics evolution: an analysis according to languages , 2013, Scientometrics.

[40]  Lokman I. Meho,et al.  Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar , 2007, J. Assoc. Inf. Sci. Technol..

[41]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..

[42]  Mike Thelwall,et al.  Search engine coverage bias: evidence and possible causes , 2004, Inf. Process. Manag..

[43]  David Bawden,et al.  Is Google enough? Comparison of an internet search engine with academic library resources , 2005, Aslib Proc..

[44]  Mike Thelwall,et al.  Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories , 2018, J. Informetrics.

[45]  Emilio Delgado López-Cózar,et al.  Can we use Google Scholar to identify highly-cited documents? , 2017, J. Informetrics.

[46]  Mike Thelwall,et al.  Microsoft Academic Automatic Document Searches: Accuracy for Journal Articles and Suitability for Citation Analysis , 2017, J. Informetrics.

[47]  Vincas Grigas,et al.  'Just Google it' - the scope of freely available information sources for doctoral thesis writing , 2017, Inf. Res..

[48]  Péter Jacsó,et al.  Google Scholar revisited , 2008, Online Inf. Rev..

[49]  Helen Georgas,et al.  Google vs. the Library (Part II): Student Search Patterns and Behaviors when Using Google and a Federated Search Tool , 2014 .

[50]  C. Lee Giles,et al.  CiteSeerX data: semanticizing scholarly papers , 2016, SBD '16.

[51]  Martin P. Brändle,et al.  The coverage of Microsoft Academic: analyzing the publication output of a university , 2017, Scientometrics.

[52]  Rebecca Reznik-Zellen,et al.  Trends in Large-Scale Subject Repositories , 2010, D Lib Mag..

[53]  Satu Alakangas,et al.  Microsoft Academic is one year old: the Phoenix is ready to leave the nest , 2017, Scientometrics.

[54]  R Brian Haynes,et al.  Retrieving Clinical Evidence: A Comparison of PubMed and Google Scholar for Quick Clinical Searches , 2013, Journal of medical Internet research.

[55]  Karen Hill,et al.  International Directory of Company Histories , 2009 .