Abstract This study examined the exchangeability of alternative methods for measuring science achievement at the elementary level. Observation of a student performing a hands-on investigation was considered to be the “benchmark” method for science performance assessments. Four less-costly methods — or possible “surrogates” for the benchmark — were, in order of decreasing verisimilitude: (a) A notebook report of an investigation, (b) a computer simulation of an investigation, (c) short-answer questions about an investigation, and (d) multiple-choice questions about an investigation. Exchangeability of each of the four surrogates for the benchmark was examined using three different investigations: “Electric Mysteries,” “Paper Towels,” and “Bugs.” One hundred and ninety-seven fifth- and sixth-grade students were given: (a) All investigations with each method, (b) a multiple-choice science achievement test, and (c) an aptitude test. Results of exchangeability analyses indicated that only the notebook provided a reasonable surrogate for the benchmark. This finding was replicated across the three investigations. Moreover, combinations of surrogates, including the multiple-choice science achievement test, failed to approximate information gained from direct observation of student performance, over and above information provided by the notebook surrogate.
[1]
N. Frederiksen.
The real test bias: Influences of testing on teaching and learning.
,
1984
.
[2]
R. Glaser.
Education and Thinking: The Role of Knowledge.
,
1984
.
[3]
Norman Frederiksen,et al.
THE REAL TEST BIAS
,
1981
.
[4]
George E. Hein.
The Right Test for Hands-On Learning?.
,
1987
.
[5]
R. Shavelson.
Performance Assessments: Political Rhetoric and Measurement Reality
,
1992
.
[6]
Michael E. Martinez.
A Comparison of Multiple‐Choice and Constructed Figural Response Items
,
1991
.
[7]
L. Resnick.
Education and Learning to Think
,
1987
.
[8]
R. Shavelson,et al.
Sampling Variability of Performance Assessments.
,
1993
.
[9]
Stephen B. Dunbar,et al.
Quality Control in the Development and Use of Performance Assessments
,
1991
.
[10]
M. Kane.
A Sampling Model for Validity
,
1982
.
[11]
R. Shavelson.
Performance Assessment in Science
,
1991
.
[12]
R. Shavelson,et al.
Research news and Comment: Performance Assessments
,
1992
.
[13]
Susan R. Goldman,et al.
Evaluation of Procedure-Based Scoring for Hands-On Science Assessment
,
1992
.