Testing for the fairness and predictive validity of research funding decisions: A multilevel multiple imputation for missing data approach using ex‐ante and ex‐post peer evaluation data from the Austrian science fund

It is essential for research funding organizations to ensure both the validity and fairness of the grant approval procedure. The ex‐ante peer evaluation (EXANTE) of N = 8,496 grant applications submitted to the Austrian Science Fund from 1999 to 2009 was statistically analyzed. For 1,689 funded research projects an ex‐post peer evaluation (EXPOST) was also available; for the rest of the grant applications a multilevel missing data imputation approach was used to consider verification bias for the first time in peer‐review research. Without imputation, the predictive validity of EXANTE was low (r = .26) but underestimated due to verification bias, and with imputation it was r = .49. That is, the decision‐making procedure is capable of selecting the best research proposals for funding. In the EXANTE there were several potential biases (e.g., gender). With respect to the EXPOST there was only one real bias (discipline‐specific and year‐specific differential prediction). The novelty of this contribution is, first, the combining of theoretical concepts of validity and fairness with a missing data imputation approach to correct for verification bias and, second, multilevel modeling to test peer review‐based funding decisions for both validity and fairness in terms of potential and real biases.

[1]  S. van Buuren,et al.  Multiple Imputation of Multilevel Data , 2006 .

[2]  Lutz Bornmann,et al.  Committee peer review at an international research foundation: predictive validity and fairness of selection decisions on post-graduate fellowship applications , 2005 .

[3]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[4]  Gary G. Koch,et al.  Analyzing repeated measures marginal models on sample surveys with resampling methods , 2005 .

[5]  Jorge L. Mendoza,et al.  A Step-Down Hierarchical Multiple Regression Analysis for Examining Hypotheses About Test Bias in Prediction , 1986 .

[6]  Herman Aguinis,et al.  Methodological Artifacts in Moderated Multiple Regression and Their Effects on Statistical Power , 1997 .

[7]  Paul T. von Hippel,et al.  HOW TO IMPUTE INTERACTIONS, SQUARES, AND OTHER TRANSFORMED VARIABLES , 2009 .

[8]  Robert A. Muenchen,et al.  R for SAS and SPSS Users , 2008 .

[9]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[10]  Recai M Yucel,et al.  State of the Multiple Imputation Software. , 2011, Journal of statistical software.

[11]  Daniel J. Bauer A Note on Comparing the Estimates of Models for Cluster-Correlated or Longitudinal Data with Binary or Ordinal Outcomes , 2009 .

[12]  Michael Dinges THE AUSTRIAN SCIENCE FUND: EX POST EVALUATION AND PERFORMANCE OF FWF FUNDED RESEARCH PROJECTS , 2005 .

[13]  Loet Leydesdorff,et al.  Past performance, peer review and project selection: a case study in the social and behavioral sciences , 2009, 0911.1306.

[14]  Adam W. Meade,et al.  Not Seeing Clearly With Cleary: What Test Bias Analyses Do and Do Not Tell Us , 2010, Industrial and Organizational Psychology.

[15]  Lutz Bornmann,et al.  Selecting manuscripts for a high-impact journal through peer review: A citation analysis of communications that were accepted by Angewandte Chemie International Edition, or rejected but published elsewhere , 2008, J. Assoc. Inf. Sci. Technol..

[16]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[17]  L. James,et al.  rwg: An assessment of within-group interrater agreement. , 1993 .

[18]  Sabrina Eberhart,et al.  Applied Missing Data Analysis , 2016 .

[19]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[20]  Timothy J. Robinson,et al.  Multilevel Analysis: Techniques and Applications , 2002 .

[21]  Nigel W. Bond,et al.  A multilevel cross‐classified modelling approach to peer review of grant proposals: the effects of assessor and researcher attributes on assessor ratings , 2003 .

[22]  A. Mackinnon,et al.  The use and reporting of multiple imputation in medical research – a review , 2010, Journal of internal medicine.

[23]  Steven Andrew Culpepper,et al.  Assessing differential prediction of college grades by race/ethnicity with a multilevel model , 2009 .

[24]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[25]  N. Cole BIAS IN SELECTION , 1973 .

[26]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[27]  John W. Graham,et al.  Missing Data: Analysis and Design , 2012 .

[28]  M. R. Novick The axioms and principal results of classical test theory , 1965 .

[29]  Nigel W. Bond,et al.  Peer Review in the Funding of Research in Higher Education: The Australian Experience , 2001 .

[30]  Roxana-Otilia-Sonia Hritcu,et al.  REVIEW OF JOOP J. HOX MULTILEVEL ANALYSIS – TECHNIQUES AND APPLICATIONS, Second Edition, Routledge (2010) , 2014 .

[31]  R. L. Thorndike CONCEPTS OF CULTURE-FAIRNESS , 1971 .

[32]  J. Hox Multilevel analysis: Techniques and applications, 2nd ed. , 2010 .

[33]  Lutz Bornmann,et al.  Heterogeneity of Inter-Rater Reliabilities of Grant Peer Reviews and Its Determinants: A General Estimating Equations Approach , 2012, PloS one.

[34]  Philip Bobko,et al.  TESTING FOR FAIRNESS WITH A MODERATED MULTIPLE REGRESSION STRATEGY: AN ALTERNATIVE TO DIFFERENTIAL ANALYSIS , 1978 .

[35]  P. Bliese Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. , 2000 .

[36]  Martin Reinhart,et al.  Peer review of grant applications in biology and medicine. Reliability, fairness, and validity , 2009, Scientometrics.

[37]  Paul D. Bliese,et al.  WITHIN-GROUP AGREEMENT SCORES: USING RESAMPLING PROCEDURES TO ESTIMATE EXPECTED VARIANCE. , 1994 .

[38]  Paul D. Bliese,et al.  Using Random Group Resampling in multilevel research , 2002 .

[39]  K M H Maessen Comment to the Article by van Arensbergen and van den Besselaar ‘The Selection of Scientific Talent in the Allocation of Research Grants’ , 2012 .

[40]  Robert L. Linn,et al.  Considerations for studies of test bias. , 1971 .

[41]  Ian R White,et al.  Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods , 2012, BMC Medical Research Methodology.

[42]  K J M Janssen,et al.  Multiple imputation to correct for partial verification bias revisited , 2008, Statistics in medicine.

[43]  B. Martin,et al.  Foresight in Science: Picking the Winners , 1984 .

[44]  Michael S. Fetzer,et al.  Test Bias, Differential Prediction, and a Revised Approach for Determining the Suitability of a Predictor in a Selection Context , 2009 .

[45]  Cassidy R. Sugimoto,et al.  Bias in peer review , 2013, J. Assoc. Inf. Sci. Technol..

[46]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[47]  P. Sackett,et al.  Differential prediction and the use of multiple predictors: the omitted variables problem. , 2003, The Journal of applied psychology.

[48]  Steven M. Lalonde,et al.  Transforming Variables for Normality and Linearity – When , How , Why and Why Not ' s , 2005 .

[49]  Tom A. B. Snijders,et al.  Multilevel Analysis , 2011, International Encyclopedia of Statistical Science.

[50]  Hakan Demirtas,et al.  Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment , 2008 .

[51]  G. A. Marcoulides Multilevel Analysis Techniques and Applications , 2002 .

[52]  Constantine Frangakis,et al.  Multiple imputation by chained equations: what is it and how does it work? , 2011, International journal of methods in psychiatric research.

[53]  F. Oswald,et al.  The homogeneity assumption in differential prediction analysis: does it really matter? , 2000, The Journal of applied psychology.

[54]  Aeilko H. Zwinderman,et al.  Multiple imputation to correct for partial verification bias revisited (5880–5889) , 2008 .

[55]  H. Marsh,et al.  Improving the Peer-review Process for Grant Applications , 2022 .

[56]  Adam W. Meade,et al.  Final Thoughts on Measurement Bias and Differential Prediction , 2010, Industrial and Organizational Psychology.

[57]  T. Cleary TEST BIAS: PREDICTION OF GRADES OF NEGRO AND WHITE STUDENTS IN INTEGRATED COLLEGES , 1968 .

[58]  F. Schmidt,et al.  Critical analysis of the statistical and ethical implications of various definitions of test bias. , 1976 .

[59]  Herman Aguinis,et al.  Revival of test bias research in preemployment testing. , 2010, The Journal of applied psychology.

[60]  Donald Hedeker,et al.  Imputing continuous data under some non-Gaussian distributions , 2008 .

[61]  Ofer Harel,et al.  Multiple imputation for correcting verification bias , 2006, Statistics in medicine.

[62]  Lutz Bornmann,et al.  Scientific peer review , 2011, Annu. Rev. Inf. Sci. Technol..

[63]  Lutz Bornmann,et al.  A multilevel modelling approach to investigating the predictive validity of editorial decisions: do the editors of a high profile journal select manuscripts that are highly cited after publication? , 2011 .

[64]  Cary Sas/stat ® 9.3 User's Guide the Seqdesign Procedure (chapter) Sas ® Documentation , 2011 .

[65]  S. Fletcher Guardians of Science: Fairness and Reliability of Peer Review , 1994 .

[66]  Lutz Bornmann,et al.  A content analysis of referees’ comments: how do comments on manuscripts rejected by a high-impact journal and later published in either a low- or high-impact journal differ? , 2010, Scientometrics.