A methodology for applying students' interactive task performance scores from a multimedia-based performance assessment in a Bayesian Network

Computer-based simulations are increasingly being used in educational assessment. In most cases, the simulation-based assessment (SBA) is used for formative assessment, which can be defined as assessment for learning, but as research on the topic continues to grow, possibilities for summative assessment, which can be defined as assessment of learning, are also emerging. The current study contributes to research on the latter category of assessment. In this article, we present a methodology for scoring the interactive and complex behavior of students in a specific type of SBA, namely, a Multimedia-based Performance Assessment (MBPA), which is used for a summative assessment purpose. The MBPA is used to assess the knowledge, skills, and abilities of confined space guard (CSG) students. A CSG supervises operations that are carried out in a confined space (e.g., a tank or silo). We address two specific challenges in this article: the evidence identification challenge (i.e., scoring interactive task performance), and the evidence accumulation challenge (i.e., accumulating scores in a psychometric model). Using expert ratings on the essence and difficulty of actions in the MBPA, we answer the first challenge by demonstrating that interactive task performance in MBPA can be scored. Furthermore, we answer the second challenge by recoding the expert ratings in conditional probability tables that can be used in a Bayesian Network (a psychometric model for reasoning under uncertainty and complexity). Finally, we validate and illustrate the presented methodology through the analysis of the response data of 57 confined space guard students who performed in the MBPA. We present a study on the analysis of performance data of multimedia-based performance assessment.We aim to answer the evidence identification and evidence accumulation challenge in this study.We present data based of 57 students a multimedia-based performance assessment.We use log file analyses.We use Bayes nets to evaluate student performance in the multimedia-based performance assessment.

[1]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[2]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[3]  Russell G. Almond,et al.  Bayesian Networks in Educational Assessment , 2015 .

[5]  Brian C. Nelson,et al.  Evidence-centered Design for Diagnostic Assessment within Digital Learning Environments: Integrating Modern Psychometrics and Educational Data Mining , 2012, EDM 2012.

[6]  Russell G. Almond,et al.  Bayes Nets in Educational Assessment: Where the Numbers Come From , 1999, UAI.

[7]  P. Schrader,et al.  Are All Games the Same , 2012 .

[8]  R. Shavelson,et al.  Sampling Variability of Performance Assessments. , 1993 .

[9]  R. L. Ebel,et al.  Essentials of educational measurement , 1972 .

[10]  Stephen B. Dunbar,et al.  Quality Control in the Development and Use of Performance Assessments , 1991 .

[11]  Rafael Rumí,et al.  Bayesian networks in environmental modelling , 2011, Environ. Model. Softw..

[12]  R. Brennan Performance Assessments from the Perspective of Generalizability Theory , 2000 .

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  James T. Miller,et al.  An Empirical Evaluation of the System Usability Scale , 2008, Int. J. Hum. Comput. Interact..

[15]  Steffen L. Lauritzen,et al.  Bayesian updating in causal probabilistic networks by local computations , 1990 .

[16]  Richard Wainess,et al.  Automatic Assessment of Complex Task Performance in Games and Simulations. CRESST Report 775. , 2010 .

[17]  Robert J. Mislevy,et al.  Specifying and Refining a Measurement Model for a Computer-Based Interactive Assessment , 2004 .

[18]  V. Shute,et al.  Stealth Assessment: Measuring and Supporting Learning in Video Games , 2013 .

[19]  Stephen B. Dunbar,et al.  Complex, Performance-Based Assessment: Expectations and Validation Criteria , 1991 .

[20]  D. Cicchetti,et al.  Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. , 1981, American journal of mental deficiency.

[21]  Alan Koenig,et al.  Aligning Instruction and Assessment with Game and Simulation Design. CRESST Report 780. , 2011 .

[22]  H. Toutenburg Fleiss, J. L.: Statistical Methods for Rates and Proportions. John Wiley & Sons, New York‐London‐Sydney‐Toronto 1973. XIII, 233 S. , 1974 .

[23]  Sebastiaan Klerk An Overview of Innovative Computer-Based Testing , 2012 .

[24]  Chris Dede,et al.  Assessment, Technology, and Change , 2010 .

[25]  Richard Wainess,et al.  A Conceptual Framework for Assessing Performance in Games and Simulations. CRESST Report 771. , 2010 .

[26]  Robert J. Mislevy,et al.  A Bayesian Network Approach to Modeling Learning Progressions and Task Performance. CRESST Report 776. , 2010 .

[27]  Bernard P. Veldkamp,et al.  Psychometric analysis of the performance data of simulation-based assessment: A systematic review and a Bayesian network example , 2015, Comput. Educ..

[28]  Sandip Sinharay,et al.  Model Diagnostics for Bayesian Networks , 2004 .

[29]  Robert J. Mislevy,et al.  A Bayes net approach to modeling learning progressions and task performances , 2009 .

[30]  Cynthia G. Parshall,et al.  Practical Considerations in Computer-Based Testing , 2002 .

[31]  R. Levy Psychometric and Evidentiary Advances, Opportunities, and Challenges for Simulation-Based Assessment , 2013 .

[32]  Gregory K. W. K. Chung,et al.  Design and Validation of Technology-Based Performance Assessments , 2007 .

[33]  Steven M. Downing,et al.  Handbook of test development , 2006 .

[34]  Kurt Squire,et al.  Video games in education , 2003, Int. J. Intell. Games Simul..

[35]  Robert J. Mislevy,et al.  Posterior Predictive Model Checking for Multidimensionality in Item Response Theory , 2006 .

[36]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence, Second Edition , 2010 .

[37]  Robert J. Mislevy,et al.  Evidence-Centered Design of Epistemic Games: Measurement Principles for Complex Learning Environments. , 2010 .

[38]  Gregory K. W. K. Chung,et al.  Identifying Key Features of Student Performance in Educational Video Games and Simulations through Cluster Analysis , 2012, EDM 2012.

[39]  C. V. D. van der Vleuten,et al.  Composite undergraduate clinical examinations: how should the components be combined to maximize reliability? , 2001, Medical education.

[40]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[41]  R. Almond,et al.  Focus Article: On the Structure of Educational Assessments , 2003 .

[42]  Roy Levy Dynamic Bayesian Network Modeling of Game Based Diagnostic Assessments. CRESST Report 837. , 2014 .

[43]  A. P. Dawid,et al.  Applications of a general propagation algorithm for probabilistic expert systems , 1992 .

[44]  Randy Elliot Bennett,et al.  Problem Solving in Technology-Rich Environments. A Report from the NAEP Technology-Based Assessment Project, Research and Development Series. NCES 2007-466. , 2007 .

[45]  Robert J. Mislevy,et al.  Putting ECD into Practice: The Interplay of Theory and Data in Evidence Models within a Digital Learning Environment , 2012, EDM 2012.

[46]  Bernard P. Veldkamp,et al.  A framework for designing and developing multimedia-based performance assessment in vocational education , 2017, Educational Technology Research and Development.

[47]  James W Pellegrino,et al.  Technology and Testing , 2009, Science.

[48]  V. Shute,et al.  Melding the Power of Serious Games and Embedded Assessment to Monitor and Foster Learning: Flow and Grow , 2009 .

[49]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .