论文信息 - Science performance assessments: benchmarks and surrogates

Science performance assessments: benchmarks and surrogates

Abstract This study examined the exchangeability of alternative methods for measuring science achievement at the elementary level. Observation of a student performing a hands-on investigation was considered to be the “benchmark” method for science performance assessments. Four less-costly methods — or possible “surrogates” for the benchmark — were, in order of decreasing verisimilitude: (a) A notebook report of an investigation, (b) a computer simulation of an investigation, (c) short-answer questions about an investigation, and (d) multiple-choice questions about an investigation. Exchangeability of each of the four surrogates for the benchmark was examined using three different investigations: “Electric Mysteries,” “Paper Towels,” and “Bugs.” One hundred and ninety-seven fifth- and sixth-grade students were given: (a) All investigations with each method, (b) a multiple-choice science achievement test, and (c) an aptitude test. Results of exchangeability analyses indicated that only the notebook provided a reasonable surrogate for the benchmark. This finding was replicated across the three investigations. Moreover, combinations of surrogates, including the multiple-choice science achievement test, failed to approximate information gained from direct observation of student performance, over and above information provided by the notebook surrogate.

Gail P. Baxter | Richard J. Shavelson | R. Shavelson | G. Baxter

[1] N. Frederiksen. The real test bias: Influences of testing on teaching and learning. , 1984 .

[2] R. Glaser. Education and Thinking: The Role of Knowledge. , 1984 .

[3] Norman Frederiksen,et al. THE REAL TEST BIAS , 1981 .

[4] George E. Hein. The Right Test for Hands-On Learning?. , 1987 .

[5] R. Shavelson. Performance Assessments: Political Rhetoric and Measurement Reality , 1992 .

[6] Michael E. Martinez. A Comparison of Multiple‐Choice and Constructed Figural Response Items , 1991 .

[7] L. Resnick. Education and Learning to Think , 1987 .

[8] R. Shavelson,et al. Sampling Variability of Performance Assessments. , 1993 .

[9] Stephen B. Dunbar,et al. Quality Control in the Development and Use of Performance Assessments , 1991 .

[10] M. Kane. A Sampling Model for Validity , 1982 .

[11] R. Shavelson. Performance Assessment in Science , 1991 .

[12] R. Shavelson,et al. Research news and Comment: Performance Assessments , 1992 .

[13] Susan R. Goldman,et al. Evaluation of Procedure-Based Scoring for Hands-On Science Assessment , 1992 .