Screening the most highly cited papers in longitudinal bibliometric studies and systematic literature reviews of a research field or journal: Widespread used metrics vs a percentile citation-based approach

Abstract There is a literature gap regarding the period representativeness bias associated with sample selection in longitudinal bibliometric studies. The purpose of this paper is to analyse and compare, in terms of period representativeness, the common methods used for selecting a sample of the highly impactful papers in a field/ journal. Using 92 593 papers (Information Science & Library Science area, 1977–2016), we compared, in terms of the number of papers/year, samples of the 100 most impactful papers, obtained with different selection options. We repeated the analysis also for Top500, Top2000, and Top20000. This study shows that the frequently used metrics to compare the impact of papers and to select a sample of "most impactful papers" published in each year and each field may privilege specific periods while neglecting others. The main result of our study is that the percentile citation-based method reduces this "year of publication" representativeness bias. This paper draws attention to the importance of the sample selection, in bibliometric studies, and to the period representativeness bias associated with different choices to select the "most impactful papers".

[1]  J. Leatherman,et al.  Small business survival and sample selection bias , 2011 .

[2]  Santo Fortunato,et al.  Methods to account for citation inflation in research evaluation , 2019, Research Policy.

[3]  A. Adam,et al.  “Researching the Research” in Prostate Cancer: A Comparative Bibliometric Analysis of the Top 100 Cited Articles in the Field of Prostate Cancer , 2017, Current Urology.

[4]  J. Heckman Sample selection bias as a specification error , 1979 .

[5]  I. Alon,et al.  Credit Risk Research: Review and Agenda , 2018 .

[6]  Chaomei Chen,et al.  Searching for intellectual turning points: Progressive knowledge domain visualization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Ajith Abraham,et al.  Engineering applications of artificial intelligence: A bibliometric analysis of 30 years (1988-2018) , 2019, Eng. Appl. Artif. Intell..

[8]  Gerson Pech,et al.  Percentile and stochastic-based approach to the comparison of the number of citations of articles indexed in different bibliographic databases , 2020, Scientometrics.

[9]  A. Broström,et al.  Mapping research on R&D, innovation and productivity: a study of an academic endeavour , 2017 .

[10]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[11]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[12]  K. McCain Cocited author mapping as a valid representation of intellectual structure , 1986 .

[13]  Lutz Bornmann,et al.  What is societal impact of research and how can it be assessed? a literature survey , 2013, J. Assoc. Inf. Sci. Technol..

[14]  Wesley Vieira da Silva,et al.  Bibliometrics and systematic reviews: A comparison between the Proknow-C and the Methodi Ordinatio , 2020, J. Informetrics.

[15]  Jian Wang,et al.  Citation time window choice for research impact evaluation , 2013, Scientometrics.

[16]  Henk F. Moed,et al.  Comprehensive indicator comparisons intelligible to non-experts: the case of two SNIP versions , 2015, Scientometrics.

[17]  Mercedes Úbeda-García,et al.  The intellectual structure of human resource management research: a bibliometric study of the international journal of human resource management, 2000–2012 , 2017 .

[18]  Eleonora Bottani,et al.  Green warehousing: Systematic literature review and bibliometric analysis , 2019, Journal of Cleaner Production.

[19]  Ludo Waltman,et al.  Citation-based clustering of publications using CitNetExplorer and VOSviewer , 2017, Scientometrics.

[20]  Gerson Pech,et al.  Assessing the publication impact using citation data from both Scopus and WoS databases: an approach validated in 15 research fields , 2020, Scientometrics.

[21]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[22]  Yaniv Reingewertz,et al.  Academic in-group bias: An empirical examination of the link between author and journal affiliation , 2018, J. Informetrics.

[23]  Massimo Aria,et al.  bibliometrix: An R-tool for comprehensive science mapping analysis , 2017, J. Informetrics.

[24]  Ásrún Matthíasdóttir,et al.  Online data collection in academic research: advantages and limitations , 2007, Br. J. Educ. Technol..

[25]  L. Romero,et al.  Trends in Sigma-1 Receptor Research: A 25-Year Bibliometric Analysis , 2019, Front. Pharmacol..

[26]  Ludo Waltman,et al.  Software survey: VOSviewer, a computer program for bibliometric mapping , 2009, Scientometrics.

[27]  Mike Thelwall The influence of highly cited papers on field normalised indicators , 2018, Scientometrics.

[28]  O. Persson,et al.  How to use Bibexcel for various types of bibliometric analysis , 2009 .

[29]  G. Narayanamurthy,et al.  Mapping the Intellectual Structure of Social Entrepreneurship Research: A Citation/Co-citation Analysis , 2020, Journal of Business Ethics.

[30]  Suzari Abdul Rahim,et al.  A Global Trend of the Electronic Supply Chain Management (e-SCM) Research: A Bibliometric Analysis , 2018 .

[31]  Marijn Janssen,et al.  Open data policies, their implementation and impact: A framework for comparison , 2014, Gov. Inf. Q..

[32]  Jian Wang,et al.  How to improve the prediction based on citation impact percentiles for years shortly after the publication date? , 2013, J. Informetrics.

[33]  Lutz Bornmann,et al.  Normalization of Mendeley reader counts for impact assessment , 2016, J. Informetrics.

[34]  H. Bodenhorn,et al.  Sample-Selection Biases and the “Industrialization Puzzle” , 2015 .

[35]  Judit Bar-Ilan,et al.  Informetrics at the beginning of the 21st century - A review , 2008, J. Informetrics.

[36]  Ludo Waltman,et al.  CitNetExplorer: A new software tool for analyzing and visualizing citation networks , 2014, J. Informetrics.

[37]  L. Bernardinelli,et al.  Citation patterns and trends of systematic reviews about mindfulness. , 2017, Complementary therapies in clinical practice.

[38]  A. Hwang,et al.  What Are the 100 Most Cited Articles in Business and Management Education Research, and What Do They Tell Us? , 2015 .

[39]  Alonso Rodríguez-Navarro,et al.  Research assessment by percentile-based double rank analysis , 2017, J. Informetrics.

[40]  Matthew Semadeni,et al.  Sample selection bias and Heckman models in strategic management research , 2016 .

[41]  F. Diekmann,et al.  Bibliometric Profile of an Agbioscience Research Enhancement Grant Program , 2019, Journal of Agricultural & Food Information.

[42]  Hugo Paredes,et al.  Scientometric analysis of scientific publications in CSCW , 2017, Scientometrics.

[43]  Thed N. van Leeuwen,et al.  Towards a new crown indicator: Some theoretical considerations , 2010, J. Informetrics.

[44]  Wolfgang Glänzel,et al.  Bibliographic coupling and hierarchical clustering for the validation and improvement of subject-classification schemes , 2015, Scientometrics.

[45]  Jian Wang,et al.  Which percentile-based approach should be preferred for calculating normalized citation impact values? An empirical comparison of five approaches including a newly developed citation-rank approach (P100) , 2013, J. Informetrics.

[46]  Cristiano Giuffrida,et al.  Do all citations value the same? Valuing citations by the value of the citing items , 2018, J. Informetrics.

[47]  Lutz Bornmann,et al.  Normalisation of citation impact in economics , 2017, Scientometrics.

[48]  José Luis Ortega,et al.  The presence of academic journals on Twitter and its relationship with dissemination (tweets) and research impact (citations) , 2017, Aslib J. Inf. Manag..

[49]  Giovanni Felici,et al.  Predicting publication long-term impact through a combination of early citations and journal impact factor , 2019, J. Informetrics.

[50]  Loet Leydesdorff,et al.  The integrated impact indicator revisited (I3*): a non-parametric alternative to the journal impact factor , 2018, Scientometrics.

[51]  A. Kirsch,et al.  Historical bibliometric analysis of the top cited articles on vesicoureteral reflux 1950-2016, and incorporation of a novel impact index. , 2018, Journal of pediatric urology.

[52]  Mike Thelwall,et al.  Three practical field normalised alternative indicator formulae for research evaluation , 2016, J. Informetrics.

[53]  Rodrigo Costas,et al.  Individual and field citation distributions in 29 broad scientific fields , 2018, J. Informetrics.

[54]  Ludo Waltman,et al.  A review of the literature on citation impact indicators , 2015, J. Informetrics.

[55]  Ludo Waltman,et al.  On the calculation of percentile-based bibliometric indicators , 2012, J. Assoc. Inf. Sci. Technol..

[56]  Mu-Hsuan Huang,et al.  A comparative study on detecting research fronts in the organic light-emitting diode (OLED) field using bibliographic coupling and co-citation , 2014, Scientometrics.

[57]  O. Hughes,et al.  THE TOP 100 MOST CITED MANUSCRIPTS IN BLADDER CANCER: A BIBLIOMETRIC ANALYSIS (Review article). , 2020, International journal of surgery.

[58]  Gerson Pech,et al.  Method for comparison of the number of citations from papers in different databases , 2019, ISSI.

[59]  Michael J. Lee,et al.  The 100 most cited articles in the endovascular treatment of thoracic and abdominal aortic aneurysms. , 2018, Journal of vascular surgery.

[60]  Loet Leydesdorff,et al.  How well does I3 perform for impact measurement compared to other bibliometric indicators? The convergent validity of several (field-normalized) indicators , 2019, Scientometrics.

[61]  E. Garfield,et al.  The geography of science: disciplinary and national mappings , 1985 .