Generalizability of Large-Scale Performance Assessments in Science: Promises and Problems

This study provides empirical evidence about the sampling variability and generalizability (reliability) of a statewide science performance assessment. Results at both individual and school levels indicate that task-sampling variability was the major source of measurement error in the performance assessment; rater-sampling variability was negligible. Adding more tasks improves the generalizability of the measurement. For the school-level assessment, the variation of performance among students within a school was larger than the variation among schools. Increasing the number of students taking a test within a school thus increases the generalizability of the assessment. Finally, the allocation of students in a matrix-sampling design is compared to a studentscrossed-with-tasks design. The former would require fewer tasks per student than the latter to build a generalizable measure of school performance.

[1]  R. Shavelson,et al.  Sampling Variability of Performance Assessments. , 1993 .

[2]  Gail P. Baxter,et al.  Mathematics Performance Assessment: Technical Quality and Diverse Student Impact. , 1993 .

[3]  Maria Araceli Ruiz-Primo,et al.  On the Stability of Performance Assessments , 1993 .

[4]  B. Worthen Critical Issues That Will Determine the Future of Alternative Assessment. , 1993 .

[5]  R. Shavelson Performance Assessments: Political Rhetoric and Measurement Reality , 1992 .

[6]  B. Clinton The Clinton Plan for Excellence in Education. , 1992 .

[7]  Stephen B. Dunbar,et al.  Complex, Performance-Based Assessment: Expectations and Validation Criteria , 1991 .

[8]  Stephen B. Dunbar,et al.  Quality Control in the Development and Use of Performance Assessments , 1991 .

[9]  Pamela R. Aschbacher Performance Assessment: State Activity, Interest, and Concerns , 1991 .

[10]  Richard J. Shavelson,et al.  Generalizability Theory: A Primer , 1991 .

[11]  N. Webb,et al.  Generalizability of Job Performance Measurements: Marine Corps Rifleman , 1990 .

[12]  Sandra Johnson,et al.  EVALUATING AND PREDICTING SURVEY EFFICIENCY USING GENERALIZABILITY THEORY , 1985 .

[13]  R. Brennan Elements of generalizability theory , 1983 .

[14]  Linda Allal,et al.  EXTENSION OF GENERALIZABILITY THEORY AND ITS APPLICATIONS IN EDUCATIONAL MEASUREMENT , 1981 .

[15]  Kenneth A. Sirotnik,et al.  INCIDENCE SAMPLING: AN INTEGRATED THEORY FOR “MATRIX SAMPLING” , 1977 .

[16]  Donald B. Rubin,et al.  The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. , 1974 .