论文信息 - Comparing Elo, Glicko, IRT, and Bayesian IRT Statistical Models for Educational and Gaming Data

Comparing Elo, Glicko, IRT, and Bayesian IRT Statistical Models for Educational and Gaming Data

Statistical models used for estimating skill or ability levels often vary by field, however their underlying mathematical models can be very similar. Differences in the underlying models can be due to the need to accommodate data with different underlying formats and structure. As the models from varying fields increase in complexity, their ability to be applied to different types of data may have the ability to increase. Models that are applied to educational or psychological data have advanced to accommodate a wide range of data formats, including increased estimation accuracy with sparsely populated data matrices. Conversely, the field of online gaming has expanded over the last two decades to include the use of more complex statistical models to provide real-time game matching based on ability estimates. It can be useful to see how statistical models from educational and gaming fields compare as different datasets may benefit from different ability estimation procedures. This study compared statistical models typically used in game match making systems (Elo, Glicko) to models used in psychometric modeling (item response theory and Bayesian item response theory) using both simulated data and real data under a variety of conditions. Results indicated that conditions with small numbers of items or matches had the most accurate skill estimates using the Bayesian IRT (item response theory) one-parameter logistic (1PL) model, regardless of whether educational or gaming data were used. This held true for all sample sizes with small numbers of items. However, the Elo and the non-Bayesian IRT 1PL models were close to the Bayesian IRT 1PL model’s estimations for both gaming and educational data. While the 2PL models were not shown to be accurate for the gaming study conditions, the IRT 2PL and Bayesian IRT 2PL models outperformed the 1PL models when 2PL educational data were generated with the larger sample size and item condition. Overall, the Bayesian IRT 1PL model seemed to be the best choice across the smaller sample and match size conditions.

Breanna Morrison | Breanna A. Morrison | Breanna Morrison

[1] D. Aldous. Elo Ratings and the Sports Model: A Neglected Topic in Applied Probability? , 2017 .

[2] R. Hambleton,et al. Fundamentals of Item Response Theory , 1991 .

[3] Martha L. Stocking,et al. Developing a Common Metric in Item Response Theory , 1982 .

[4] Lyle V. Jones,et al. 1 A History and Overview of Psychometrics , 2006 .

[5] Rémi Coulom,et al. Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength , 2008, Computers and Games.

[6] Alper Sahin,et al. The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory , 2016 .

[7] A. R. Campbell,et al. Predicting student success: a 10-year review using integrative review and meta-analysis. , 1996, Journal of professional nursing : official journal of the American Association of Colleges of Nursing.

[8] Richard J. Patz,et al. A Straightforward Approach to Markov Chain Monte Carlo Methods for Item Response Models , 1999 .

[9] Mark D. Reckase,et al. The Discriminating Power of Items That Measure More Than One Dimension , 1991 .

[10] Radek Pelánek,et al. Application of Time Decay Functions and the Elo System in Student Modeling , 2014, EDM.

[11] M. Glickman. The Glicko system , 2011 .