A Systematic Exploration of Judge Scoring Designs and Judge Analysis Methods in Performance Assessment
暂无分享,去创建一个
[1] Constructing an Item Bank Using Partial Credit Scoring. , 1984 .
[2] Matthew S. Johnson,et al. A Hierarchical Rater Model for Constructed Responses, with a Signal Detection Rater Model , 2011 .
[3] Richard R. Sudweeks,et al. A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing , 2004 .
[4] Brian W. Junker,et al. The Hierarchical Rater Model for Rated Test Items and its Application to Large-Scale Educational Assessment Data , 2002 .
[5] Ron I. Thomson,et al. Rater Experience, Rating Scale Length, and Judgments of L2 Pronunciation: Revisiting Research Conventions , 2013 .
[6] Alija Kulenović,et al. Standards for Educational and Psychological Testing , 1999 .
[7] Mary E. Lunz,et al. Measuring the Impact of Judge Severity on Examination Scores , 1990 .
[8] Machteld Hoskens,et al. The Rater Bundle Model , 2001 .
[9] J. Neyman,et al. Consistent Estimates Based on Partially Consistent Observations , 1948 .
[10] B. Wright,et al. Construction of measures from many-facet data. , 2002, Journal of applied measurement.
[11] Wen-Chung Wang,et al. Using SAS PROC NLMIXED to fit item response theory models , 2005, Behavior research methods.
[12] R E Schumacker. Many-facet Rasch analysis with crossed, nested, and mixed designs. , 1999, Journal of outcome measurement.
[13] Anthony S. Bryk,et al. Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .
[14] E. Muraki. A Generalized Partial Credit Model: Application of an EM Algorithm , 1992 .
[15] Gad S. Lim. The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters , 2011 .
[16] F. Baker,et al. Item response theory : parameter estimation techniques , 1993 .
[17] H M Taherbhai,et al. The impact of rater effects on weighted composite scores under nested and spiraled scoring designs, using the multifaceted Rasch model. , 2001, Journal of outcome measurement.
[18] Xiaoming Xi,et al. HOW DO RATERS FROM INDIA PERFORM IN SCORING THE TOEFL IBT™ SPEAKING SECTION AND WHAT KIND OF TRAINING HELPS? , 2009 .
[19] M. Miller,et al. Measurement and Assessment in Teaching , 1994 .
[20] Judit Kormos,et al. The Effect of Mode of Response on a Semidirect Test of Oral Proficiency , 2011 .
[21] G. Karabatsos,et al. Hierarchical Generalized Linear Models for the Analysis of Judge Ratings. , 2009 .
[22] S. Embretson,et al. Item response theory for psychologists , 2000 .
[23] R. Hambleton,et al. Item Response Theory: Principles and Applications , 1984 .
[24] Xiaoming Xi,et al. Evaluating analytic scoring for the TOEFL® Academic Speaking Test (TAST) for operational use , 2007 .