Measuring the Reliability of Diagnostic Mastery Classifications at Multiple Levels of Reporting

ABSTRACT As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an operational, large-scale, diagnostic assessment system. This assessment system reports the results and associated reliability evidence at the individual skill level for each academic content standard and broader content strands. The system also summarizes results for the overall subject using achievement levels, which are often included in state accountability metrics. Results are summarized as measures of association between true and estimated mastery status for each level of reporting.

[1]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[2]  Louis Guttman,et al.  A basis for analyzing test-retest reliability , 1945, Psychometrika.

[3]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[4]  D. Eignor The standards for educational and psychological testing. , 2013 .

[5]  Laine Bradshaw,et al.  Measuring the Reliability of Diagnostic Classification Model Examinee Estimates , 2013, J. Classif..

[6]  Douglas G. Bonett,et al.  Inferential Methods for the Tetrachoric Correlation Coefficient , 2005 .

[7]  S. Sinharay,et al.  Measures of Agreement to Assess Attribute-Level Classification Accuracy and Consistency for Cognitive Diagnostic Assessments , 2018, Journal of Educational Measurement.

[8]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[9]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[10]  Identifiers California,et al.  Annual Meeting of the National Council on Measurement in Education , 1998 .

[11]  Meagan Karvonen,et al.  Use of Evidence-Centered Design to Develop Learning Maps-Based Assessments , 2019, International Journal of Testing.

[12]  Louis A. Roussos,et al.  The fusion model skills diagnosis system , 2007 .

[13]  Mark J. Gierl,et al.  Cognitive diagnostic assessment for education: Theory and applications. , 2007 .

[14]  C. Mitchell Dayton,et al.  The Use of Probabilistic Models in the Assessment of Mastery , 1977 .

[15]  Neal M. Kingston,et al.  Condensed Mastery Profile Method for Setting Standards for Diagnostic Assessment Systems. , 2017 .

[16]  K. Tatsuoka RULE SPACE: AN APPROACH FOR DEALING WITH MISCONCEPTIONS BASED ON ITEM RESPONSE THEORY , 1983 .

[17]  Jonathan Templin,et al.  Diagnostic Measurement: Theory, Methods, and Applications , 2010 .

[18]  S. Haberman,et al.  How Much can we Reliably Know About what Examinees Know? , 2009 .

[19]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[20]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[21]  Andrew Izsák,et al.  Diagnosing Teachers’ Understandings of Rational Numbers: Building a Multidimensional Test Within the Diagnostic Classification Framework , 2014 .

[22]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .