A framework for validation of the use of performance assessment in science

The assessment of learning in school science is important to the students, educators, policy makers, and the general public. Changes in curriculum and instruction in science have led to greater emphasis upon alternative modes of assessment. Most significant of these newer approaches is “performance assessment”, where students manipulate materials in experimental situations. Only recently has the development of performance assessment procedures, and the appropriate strategies for interpreting their results, received substantial research attention. In this study, educational measurement and science education perspectives are synthesized into an integrated analysis of the validity of procedures, inferences and consequences arising from the use of performance assessment. The Student Performance Component of the 1991 B.C. Science Assessment is offered as an example. A framework for the design, implementation, and interpretation of hands-on assessment in school science is presented, with validity and feasibility considered at every stage. Particular attention is given to a discussion of the influence of construct labels upon assessment design. A model for the description of performance assessment tasks is proposed. This model has the advantage of including both the science content and the science skill demands for each task. The model is then expanded to show how simultaneous representation of multiple tasks enhances the ability to ensure adequate sampling from appropriate content domains. The main conclusion of this validation inquiry is that every aspect of performance assessment in science is influenced by the perspective towards learning in science that permeates the assessment, and that this influence must be considered at all times. Recommendations are made for those carrying out practical assessments, as well as suggestions of areas that invite further research.

[1]  Heidy Wienekamp,et al.  Does unconscious behaviour of teachers cause chemistry lessons to be unpopular with girls , 1987 .

[2]  Barbara A. Sanderson,et al.  Science--A Process Approach, Product Development Report No. 8. , 1971 .

[3]  Laurence Rogers,et al.  IT in science in the national curriculum , 1990 .

[4]  Derek Hodson,et al.  Assessment of practical work , 1992 .

[5]  Gail P. Baxter,et al.  Mathematics Performance Assessment: Technical Quality and Diverse Student Impact. , 1993 .

[6]  Susan R. Goldman,et al.  Evaluation of Procedure-Based Scoring for Hands-On Science Assessment , 1992 .

[7]  Jim Donnelly,et al.  An assessment‐led approach to processes in the science curriculum , 1985 .

[8]  L. Cronbach,et al.  Construct validity in psychological tests. , 1955, Psychological bulletin.

[9]  Lee J. Cronbach Essentials of psychological testing / Lee J. Cronbach , 1970 .

[10]  J. M. Ryan,et al.  Laboratory performance assessment , 1993 .

[11]  J. Greeno A perspective on thinking. , 1989 .

[12]  J. Mackie,et al.  The Conduct of Inquiry: Methodology for Behavioural Science , 1965 .

[13]  Pinchas Tamir,et al.  Conclusions and Discussion of Findings Related to Practical Skills Testing in Science. , 1992 .

[14]  M. Linn,et al.  Gender, Mathematics, and Science , 1989 .

[15]  Robert L. Linn,et al.  Performance Assessment: Policy Promises and Technical Measurement Standards , 1994 .

[16]  I. Robertson Girls and boys and practical science , 1987 .

[17]  R. Shavelson Performance Assessments: Political Rhetoric and Measurement Reality , 1992 .

[18]  TECHNICAL recommendations for psychological tests and diagnostic techniques. , 1954, Psychological bulletin.

[19]  Roger Osborne,et al.  Learning in science , 1985 .

[20]  P. Moss Shifting Conceptions of Validity in Educational Measurement: Implications for Performance Assessment , 1992 .

[21]  C. Bereiter Toward a Solution of the Learning Paradox , 1985 .

[22]  Susan R. Goldman,et al.  Alternative Technologies for Large Scale Science Assessment: Instrument of Education Reform 1 , 1991 .

[23]  D. Miller-Jones Culture and testing. , 1989 .

[24]  R. Snow Representative and Quasi-Representative Designs for Research on Teaching1 , 1974 .

[25]  J. R. L. Swain Practical Objectives--A Review. , 1974 .

[26]  Maria Araceli Ruiz-Primo,et al.  On the Stability of Performance Assessments , 1993 .

[27]  A. Anastasi Psychological testing, 6th ed. , 1988 .

[28]  F. Newman Linking Restructuring to Authentic Student Achievement. , 1991 .

[29]  Great Britain. Welsh Office Science in the national curriculum (1991) , 1993 .

[30]  Gaalen Erickson,et al.  Prior Experience and Gender Differences in Science Achievement. , 1991 .

[31]  Terry Allsop,et al.  Practical work in science , 1985 .

[32]  T. Bryce,et al.  What can they do? A review of practical assessment in Science , 1985 .

[33]  Gail P. Baxter,et al.  What We've Learned about Assessing Hands-On Science , 1992 .

[34]  L. Resnick,et al.  Mathematics and Science Learning: A New Conception , 1983, Science.

[35]  B. Bloom Taxonomy of educational objectives , 1956 .

[36]  L. Crocker,et al.  Introduction to Classical and Modern Test Theory , 1986 .

[37]  Richard J. Shavelson,et al.  Generalizability Theory: A Primer , 1991 .

[38]  K. Howe,et al.  Two Dogmas of Educational Research , 1985 .

[39]  Vincent N. Lunetta,et al.  An Analysis of Laboratory Inquiries in the BSCS Yellow Version , 1978 .

[40]  S. Messick Meaning and Values in Test Validation: The Science and Ethics of Assessment , 1989 .

[41]  L. Cronbach Five perspectives on the validity argument. , 1988 .

[42]  R. Shavelson,et al.  On the Content Validity of Performance Assessments: Centrality of Domain Specification , 1996 .

[43]  T. M. Tomlinson,et al.  A Nation At Risk , 1987 .

[44]  T. Postlethwaite,et al.  Science achievement in twenty three countries , 1992 .

[45]  L. Shepard Psychometricians’ Beliefs About Learning , 1991 .

[46]  Pinchas Tamir,et al.  Practical skills testing in science , 1992 .

[47]  Daniel Koretz,et al.  Can Portfolios Assess Student Performance and Influence Instruction? The 1991-92 Vermont Experience. , 1992 .

[48]  Karen Meyer,et al.  Children as experimenters : elementary students' actions in an experimental context with magnets , 1991 .

[49]  E. Atkins Power and Criticism: Poststructural Investigations in Education* , 1990 .

[50]  Pamela A. Moss,et al.  Can There Be Validity Without Reliability? , 1994 .

[51]  Improvement,et al.  A Pilot study of higher-order thinking skills assessment techniques in science and mathematics : final report , 1986 .

[52]  Thomas A. Romberg,et al.  Toward a New Science of Educational Testing and Assessment. SUNY Series, Teacher Preparation and Development. , 1992 .

[53]  Fred M. Newmann,et al.  Beyond Standardized Testing : Assessing Authentic Academic Achievement in the Secondary School / , 1988 .

[54]  Joan L. Herman,et al.  What Research Tells Us about Good Assessment. , 1992 .

[55]  Sex‐related differences in science achievement: a possible testing artefact , 1989 .

[56]  J. Frederiksen,et al.  A Systems Approach to Educational Testing , 1989 .

[57]  W. James Popham,et al.  The Merits of Measurement-Driven Instruction. , 1987 .

[58]  L. Shepard Evaluating Test Validity , 1993 .

[59]  R. Shavelson,et al.  Sampling Variability of Performance Assessments. , 1993 .