A two-sided academic landscape : snapshot of highly-cited documents in Google Scholar ( 1950-2013 )

The main objective of this paper is to identify and define the core characteristics of the set of highly-cited documents in Google Scholar (document types, language, free availability, sources, and number of versions), on the hypothesis that the wide coverage of this search engine may provide a different portrait of these documents with respect to that offered by traditional bibliographic databases. To do this, a query per year was carried out from 1950 to 2013 identifying the top 1,000 documents retrieved from Google Scholar and obtaining a final sample of 64,000 documents, of which 40% provided a free link to full-text. The results obtained show that the average highly-cited document is a journal or book article (62% of the top 1% most cited documents of the sample), written in English (92.5% of all documents) and available online in PDF format (86.0% of all documents). Yet, the existence of errors should be noted, especially when detecting duplicates and linking citations properly. Nonetheless, the fact that the study focused on highly cited papers minimizes the effects of these limitations. Given the high presence of books and, to a lesser extent, of other document types (such as proceedings or reports), the present research concludes that the Google Scholar data offer an original and different vision of the most influential academic documents (measured from the perspective of their citation count), a set composed not only of strictly scientific material (journal articles) but also of academic material in its broadest sense.

[1]  Mike Thelwall,et al.  The most highly cited Library and Information Science articles: Interdisciplinarity, first authors and citation patterns , 2007, Scientometrics.

[2]  Charles Oppenheim,et al.  Highly cited old papers and the reasons why they continue to be cited , 1978, J. Am. Soc. Inf. Sci..

[3]  Robert D. Simoni,et al.  The Most Highly Cited Paper in Publishing History: Protein Determination by Oliver H. Lowry , 2005 .

[4]  Lokman I. Meho,et al.  Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar , 2007, J. Assoc. Inf. Sci. Technol..

[5]  Wolfgang Glänzel,et al.  Some facts and figures on highly cited papers in the sciences, 1981–1985 , 1992, Scientometrics.

[6]  Anne-Wil Harzing,et al.  A preliminary test of Google Scholar as a source for citation data: a longitudinal study of Nobel prize winners , 2013, Scientometrics.

[7]  Thed N. van Leeuwen,et al.  Benchmarking international scientific excellence: Are highly cited research papers an appropriate frame of reference? , 2002, Scientometrics.

[8]  Bruno Maltrás,et al.  Los indicadores bibliométricos: fundamentos y aplicación al análisis de la ciencia , 2003 .

[9]  Enrique Orduña-Malea,et al.  Google Scholar Metrics evolution: an analysis according to languages , 2013, Scientometrics.

[10]  Eugene Garfield,et al.  Is citation analysis a legitimate evaluation tool? , 2005, Scientometrics.

[11]  Lutz Bornmann,et al.  Towards an ideal method of measuring research performance: Some comments to the Opthof and Leydesdorff (2010) paper , 2010, J. Informetrics.

[12]  Anne-Wil Harzing,et al.  Google Scholar as a new source for citation analysis , 2008 .

[13]  Anne-Wil Harzing,et al.  A longitudinal study of Google Scholar coverage between 2012 and 2013 , 2013, Scientometrics.

[14]  Dag W. Aksnes,et al.  The effect of highly cited papers on national citation indicators , 2004, Scientometrics.

[15]  José-Antonio Ontalba-Ruipérez,et al.  Presencia y visibilidad web de las universidades públicas españolas , 2010 .

[16]  Enrique Orduña-Malea,et al.  Nature's top 100 Re‐revisited , 2015, J. Assoc. Inf. Sci. Technol..

[17]  D. Aksnes CHARACTERISTICS OF HIGHLY CITED PAPERS , 2003 .

[18]  Judit Bar-Ilan,et al.  Citations to the “Introduction to informetrics” indexed by WOS, Scopus and Google Scholar , 2010, Scientometrics.

[19]  Wolfgang Glänzel,et al.  A bibliometric study of highly cited European physics papers in the 80s , 1995 .

[20]  José Luis Ortega Academic search engines : a quantitative outlook , 2014 .

[21]  M. HamidR.Jamali,et al.  Open access and sources of full-text articles in Google Scholar in different subject fields , 2015, Scientometrics.

[22]  Péter Jacsó,et al.  Deflated, inflated and phantom citation counts , 2006, Online Inf. Rev..

[23]  J. Davidson Frame,et al.  Highly Cited Soviet Papers: An Exploratory Investigation , 1983 .

[24]  Mike Thelwall,et al.  Sources of Google Scholar citations outside the Science Citation Index: A comparison between four science disciplines , 2008, Scientometrics.

[25]  B. Björk,et al.  Open Access to the Scientific Journal Literature: Situation 2009 , 2010, PloS one.

[26]  Reinier Plomp,et al.  The significance of the number of highly cited papers as an indicator of scientific prolificacy , 1990, Scientometrics.

[27]  Mike Thelwall,et al.  Assessing the citation impact of books: The role of Google Books, Google Scholar, and Scopus , 2011, J. Assoc. Inf. Sci. Technol..

[28]  Péter Jacsó,et al.  Using Google Scholar for journal impact factors and the h‐index in nationwide publishing assessments in academia – siren songs and air‐raid sirens , 2012 .

[29]  Sandra L. De Groote,et al.  Google Scholar versions: do more versions of an article mean greater impact? , 2014, Libr. Hi Tech.

[30]  Lokman I. Meho,et al.  Citation Analysis: A Comparison of Google Scholar, Scopus, and Web of Science , 2007, Proceedings of the American Society for Information Science and Technology.

[31]  Derek R. Smith,et al.  Highly Cited Articles in Environmental and Occupational Health, 1919–1960 , 2009, Archives of environmental & occupational health.

[32]  Wolfgang Glänzel,et al.  What are highly cited publications? A method applied to German scientific papers, 1980–1989 , 1992 .

[33]  Francis Narin,et al.  Bibliometric techniques in the evaluation of research programs , 1987 .

[34]  José Luis Ortega,et al.  Indicators for a webometric ranking of open access repositories , 2010, Scientometrics.

[35]  Péter Jacsó,et al.  The pros and cons of computing the h-index using Scopus , 2008, Online Inf. Rev..

[36]  Madian Khabsa,et al.  Digital commons , 2020, Internet Policy Rev..

[37]  Olle Persson,et al.  Are highly cited papers more international? , 2010, Scientometrics.

[38]  Isidro F. Aguillo Is Google Scholar useful for bibliometrics? A webometric analysis , 2012, Scientometrics.

[39]  Enrique Orduña-Malea,et al.  Methods for estimating the size of Google Scholar , 2014, Scientometrics.

[40]  Lutz Bornmann,et al.  Further steps towards an ideal method of measuring citation performance: The avoidance of citation (ratio) averages in field-normalization , 2011, J. Informetrics.

[41]  Péter Jacsó,et al.  The pros and cons of computing the h-index using Google Scholar , 2008, Online Inf. Rev..

[42]  Erik Wilde,et al.  Academic Search Engine Optimization (ASEO) , 2010 .

[43]  Loet Leydesdorff,et al.  The new Excellence Indicator in the World Report of the SCImago Institutions Rankings 2011 , 2011, J. Informetrics.

[44]  Anthony F. J. van Raan,et al.  The comparative impact of scientific publications and journals: Methods of measurement and graphical display , 2005, Scientometrics.

[45]  Joost C. F. de Winter,et al.  The expansion of Google Scholar versus Web of Science: a longitudinal study , 2013, Scientometrics.