A Bayesian Approach to Ranking and Rater Evaluation

We develop a Bayesian hierarchical model for the analysis of ordinal data from multirater ranking studies. The model for a rater’s score includes four latent factors: one is a latent item trait determining the true order of items and the other three are the rater’s performance characteristics, including bias, discrimination, and measurement error in the ratings. The proposed approach aims at three goals. First, three Bayesian estimators are introduced to estimate the ranks of items. They all show a substantial improvement over the widely used score sums by using the information on the variable skill of the raters. Second, rater performance can be compared based on rater bias, discrimination, and measurement error. Third, a simulation-based decision-theoretic approach is described to determine the number of raters to employ. A simulation study and an analysis based on a grant review data set are presented.

[1]  T. Sozu,et al.  Effective number of subjects and number of raters for inter‐rater reliability studies , 2006, Statistics in medicine.

[2]  Jim Albert,et al.  Ordinal Data Modeling , 2000 .

[3]  H. Young Condorcet's Theory of Voting , 1988, American Political Science Review.

[4]  V. Johnson Bayesian Model Assessment Using Pivotal Quantities , 2007 .

[5]  Valen E Johnson,et al.  Statistical analysis of the National Institutes of Health peer review system , 2008, Proceedings of the National Academy of Sciences.

[6]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[7]  F. Samejima A General Model for Free Response Data. , 1972 .

[8]  P. Graham,et al.  The analysis of ordinal agreement data: beyond weighted kappa. , 1993, Journal of clinical epidemiology.

[9]  Douglas G. Bonett,et al.  Sample Size Requirements for Testing and Estimating Coefficient Alpha , 2002 .

[10]  Thomas A Louis,et al.  Loss Function Based Ranking in Two-Stage, Hierarchical Models. , 2006, Bayesian analysis.

[11]  David Thissen,et al.  A taxonomy of item response models , 1986 .

[12]  Yuan Ji,et al.  Bayesian models based on test statistics for multiple hypothesis testing problems , 2008, Bioinform..

[13]  G. Masters A rasch model for partial credit scoring , 1982 .

[14]  M. A. Best Bayesian Approaches to Clinical Trials and Health‐Care Evaluation , 2005 .

[15]  Thomas A. Louis,et al.  Empirical Bayes Ranking Methods , 1989 .

[16]  W. Grove,et al.  A latent trait finite mixture model for the analysis of rating agreement. , 1993, Biometrics.

[17]  R. J. Mokken,et al.  Handbook of modern item response theory , 1997 .

[18]  Valen E. Johnson,et al.  On Bayesian Analysis of Multirater Ordinal Data: An Application to Automated Essay Grading , 1996 .

[19]  Alan Agresti,et al.  Mathematical and computer modelling reports: A model for agreement between ratings on an ordinal scale , 1988 .

[20]  D. Andrich A rating formulation for ordered response categories , 1978 .

[21]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[22]  H. Ishwaran Univariate and multirater ordinal cumulative link regression with covariate specific cutpoints , 2000 .

[23]  M. Becker,et al.  Assessing rater agreement using marginal association models , 2002, Statistics in medicine.

[24]  G. J. Mellenbergh,et al.  Generalized linear item response theory. , 1994 .

[25]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[26]  T. Louis,et al.  Triple‐goal estimates in two‐stage hierarchical models , 1998 .

[27]  A. Basu,et al.  Measuring Agreement Between Two Raters for Ordinal Response: a Model‐based Approach , 1999 .

[28]  Fumiko Samejima,et al.  A comment on Birnbaum's three-parameter logistic model in the latent trait theory , 1973 .

[29]  W. Kruskal Ordinal Measures of Association , 1958 .

[30]  R. Hambleton,et al.  Handbook of Modern Item Response Theory , 1997 .