On peer review in computer science: analysis of its effectiveness and suggestions for improvement

In this paper we focus on the analysis of peer reviews and reviewers behaviour in a number of different review processes. More specifically, we report on the development, definition and rationale of a theoretical model for peer review processes to support the identification of appropriate metrics to assess the processes main characteristics in order to render peer review more transparent and understandable. Together with known metrics and techniques we introduce new ones to assess the overall quality (i.e. ,reliability, fairness, validity) and efficiency of peer review processes e.g. the robustness of the process, the degree of agreement/disagreement among reviewers, or positive/negative bias in the reviewers’ decision making process. We also check the ability of peer review to assess the impact of papers in subsequent years. We apply the proposed model and analysis framework to a large reviews data set from ten different conferences in computer science for a total of ca. 9,000 reviews on ca. 2,800 submitted contributions. We discuss the implications of the results and their potential use toward improving the analysed peer review processes. A number of interesting results were found, in particular: (1) a low correlation between peer review outcome and impact in time of the accepted contributions; (2) the influence of the assessment scale on the way how reviewers gave marks; (3) the effect and impact of rating bias, i.e. reviewers who constantly give lower/higher marks w.r.t. all other reviewers; (4) the effectiveness of statistical approaches to optimize some process parameters (e.g. ,number of papers per reviewer) to improve the process overall quality while maintaining the overall effort under control. Based on the lessons learned, we suggest ways to improve the overall quality of peer-review through procedures that can be easily implemented in current editorial management systems.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  Lutz Bornmann,et al.  Selection of research fellowship recipients by committee peer review. Reliability, fairness and predictive validity of Board of Trustees' decisions , 2005, Scientometrics.

[3]  T. Jefferson,et al.  Effects of editorial peer review: a systematic review. , 2002, JAMA.

[4]  Richard Smith,et al.  Peer Review: A Flawed Process at the Heart of Science and Journals , 2006, Journal of the Royal Society of Medicine.

[5]  F. Casati,et al.  A Quantitative Analysis of Peer Review , 2011 .

[6]  R. L. Ebel,et al.  Estimation of the reliability of ratings , 1951 .

[7]  Lutz BornmannMarkus Closed versus open reviewing of journal manuscripts: how far do comments differ in language use? , 2012 .

[8]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[9]  Tom Fahey,et al.  Inter-rater agreement in the scoring of abstracts submitted to a primary care research conference , 2002, BMC health services research.

[10]  Lutz Bornmann,et al.  The validity of staff editors’ initial evaluations of manuscripts: a case study of Angewandte Chemie International Edition , 2010, Scientometrics.

[11]  K. A. McKibbon,et al.  Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study , 2008, BMJ : British Medical Journal.

[12]  Jonathan Grudin,et al.  Conferences, Community, and Technology: Avoiding a Crisis , 2010 .

[13]  S. Goodman,et al.  Manuscript Quality before and after Peer Review and Editing at Annals of Internal Medicine , 1994, Annals of Internal Medicine.

[14]  C. Wennerås,et al.  Nepotism and sexism in peer-review , 1997, Nature.

[15]  R. Spier The history of the peer-review process. , 2002, Trends in biotechnology.

[16]  F. Ingelfinger Peer review in biomedical publication. , 1974, The American journal of medicine.

[17]  Fabio Casati,et al.  Exploring and Understanding citation-Based Scientific Metrics , 2010, Adv. Complex Syst..

[18]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[19]  F. Godlee,et al.  Effect of open peer review on quality of reviews and on reviewers'recommendations: a randomised trial , 1999, BMJ.

[20]  Lutz Bornmann,et al.  Reliability of reviewers' ratings when using public peer review: a case study , 2010, Learn. Publ..

[21]  Guillaume Cabanac,et al.  Capitalizing on order effects in the bids of peer-reviewed conferences to secure reviews by expert referees , 2013, J. Assoc. Inf. Sci. Technol..

[22]  J. Kassirer,et al.  Peer review. Crude and understudied, but indispensable. , 1994, JAMA.

[23]  L. Bornmann,et al.  Does the Committee Peer Review Select the Best Applicants for Funding? An Investigation of the Selection Process for Two European Molecular Biology Organization Programmes , 2008, PloS one.

[24]  魏屹东,et al.  Scientometrics , 2018, Encyclopedia of Big Data.

[25]  Joseph A. Konstan,et al.  Conference paper selectivity and impact , 2010, Commun. ACM.

[26]  T. Jefferson,et al.  Measuring the quality of editorial peer review. , 2002, JAMA.

[27]  A. Donner A Review of Inference Procedures for the Intraclass Correlation Coefficient in the One-Way Random Effects Model , 1986 .

[28]  F. Godlee,et al.  Effect on the quality of peer review of blinding reviewers and asking them to sign their reports: a randomized controlled trial. , 1998, JAMA.

[29]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[30]  A. Link US and non-US submissions: an analysis of reviewer bias. , 1998, JAMA.

[31]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[32]  David J. DeWitt,et al.  Impact of double-blind reviewing on SIGMOD publication rates , 2006, SGMD.

[33]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[34]  Lutz Bornmann,et al.  How to detect indications of potential sources of bias in peer review: A generalized latent variable modeling approach exemplified by a gender study , 2008, J. Informetrics.

[35]  S. Ceci,et al.  Understanding current causes of women's underrepresentation in science , 2011, Proceedings of the National Academy of Sciences.

[36]  J. Bartko Corrective Note to: “The Intraclass Correlation Coefficient as a Measure of Reliability” , 1974 .

[37]  Lutz Bornmann,et al.  Committee peer review at an international research foundation: predictive validity and fairness of selection decisions on post-graduate fellowship applications , 2005 .

[38]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[39]  G. Wilkinson,et al.  Open peer review: A randomised controlled trial , 2000, British Journal of Psychiatry.

[40]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[41]  S Lock,et al.  Does Editorial Peer Review Work? , 1994, Annals of Internal Medicine.

[42]  D. Katz,et al.  Incidence and nature of unblinding by authors: our experience at two radiology journals with double-blinded peer review policies. , 2002, AJR. American journal of roentgenology.

[43]  Anthony K. H. Tung Impact of double blind reviewing on SIGMOD publication: a more detail analysis , 2006, SGMD.

[44]  Jef Akst,et al.  I hate your Paper , 2010 .

[45]  Padraig Cunningham,et al.  Relative status of journal and conference publications in computer science , 2010, Commun. ACM.

[46]  Lutz Bornmann,et al.  Closed versus open reviewing of journal manuscripts: how far do comments differ in language use? , 2012, Scientometrics.

[47]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .

[48]  Johan Bollen,et al.  Toward alternative metrics of journal impact: A comparison of download and citation data , 2005, Inf. Process. Manag..

[49]  R. Merton,et al.  Patterns of evaluation in science: Institutionalisation, structure and functions of the referee system , 1971 .

[50]  D. Benos,et al.  The ups and downs of peer review. , 2007, Advances in physiology education.

[51]  D. Cicchetti,et al.  Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. , 1981, American journal of mental deficiency.

[52]  Mike Thelwall,et al.  Validating online reference managers for scholarly impact measurement , 2011, Scientometrics.

[53]  Kenneth P. Birman,et al.  ViewpointProgram committee overload in systems , 2009, CACM.

[54]  Péter Jacsó,et al.  Metadata mega mess in Google Scholar , 2010, Online Inf. Rev..

[55]  S. Ceci,et al.  Peer review--a study of reliability. , 1982, Change.

[56]  Catherine Lord,et al.  Reliability of the ADI-R: Multiple Examiners Evaluate a Single Case , 2008, Journal of autism and developmental disorders.

[57]  Martin Reinhart,et al.  Peer review of grant applications in biology and medicine. Reliability, fairness, and validity , 2009, Scientometrics.