Trusting teachers’ judgement: research evidence of the reliability and validity of teachers’ assessment used for summative purposes

This paper summarizes the findings of a systematic review of research on the reliability and validity of teachers’ assessment used for summative purposes. In addition to the main question, the review also addressed the question ‘What conditions affect the reliability and validity of teachers’ summative assessment?’ The initial search for studies meeting the explicit inclusion criteria of relevance found 431potentially relevant studies. This number was gradually reduced, through the systematic review procedures, to 30 studies, which specifically addressed the review questions. These studies were subject to in‐depth data extraction conducted independently by two researchers, followed by reconciliation of any differences of interpretation. This procedure was also used to judge the weight of evidence provided for the review by each study so that greater emphasis could be given to findings from the most relevant and methodologically sound research. The findings of the review by no means constitute a ringing endorsement of teachers’ assessment; there was evidence of low reliability and bias in teachers’ judgements made in certain circumstances. However, this has to be considered against the low validity and lower than generally assumed reliability of external tests. The findings also point to ways of overcoming the deficiencies of teachers’ assessment and lead to implications for assessment policy, practice and research, which are proposed in the final section of the paper.

[1]  Daniel Koretz,et al.  Large‐scale Portfolio Assessments in the US: evidence pertaining to the quality of measurement , 1998 .

[2]  S. Klein,et al.  The Vemont Portfolio Assessment Program: Findings and Implications , 2005 .

[3]  Mark R. Wilson,et al.  Towards Coherence between Classroom Assessment and Accountability: 103rd Yearbook of the National Society for the Study of Education, Part II , 2005 .

[4]  P. Muchinsky,et al.  Life-History and Developmental Antecedents of Female Vocational Preferences , 1995 .

[5]  K. Hopkins,et al.  THE CONCURRENT VALIDITY OF STANDARDIZED ACHIEVEMENT TESTS BY CONTENT AREA USING TEACHERS' RATINGS AS CRITERIA , 1985 .

[6]  J. L. Moore,et al.  The Construct Validity and Context Dependency of Teacher Assessment of Practical Skills in Some Pre‐university Level Science Examinations , 1996 .

[7]  Kelly S. Shapley,et al.  Developing a Valid and Reliable Portfolio Assessment in the Primary Grades: Building on Practical Experience , 1999 .

[8]  Sally Brown,et al.  Assessment for Learning , 2005 .

[9]  Robert D. Hoge,et al.  Association Teacher-Based Judgments of Academic Achievement : A Review of Literature , 2008 .

[10]  Gerald Tindal,et al.  Using Oral Reading Rate to Predict Student Performance on Statewide Achievement Tests , 2001 .

[11]  Harry Torrance,et al.  Assessment and Testing: A Survey of Research , 1993 .

[12]  R. Murphy,et al.  Assessment in South African Schools , 2002 .

[13]  Wynne Harlen,et al.  Enhancing quality in assessment , 1995 .

[14]  Kathy Hall,et al.  Level descriptions and teacher assessment in England: towards a community of assessment practice , 2002 .

[15]  Robert D. Hoge,et al.  Analysis of teacher judgments of pupil achievement levels. , 1984 .

[16]  Daniel F. McCaffrey,et al.  The Realiability of Mathematics Portfolio Scores: Lessons From the Vermont Experience , 1995 .

[17]  R. Murphy,et al.  Changing Educational Assessment: International Perspectives and Trends , 1991 .

[18]  G. Stobart,et al.  A Systematic Review of the Impact of Summative Assessment and Tests on Students’ Motivation for Learning , 2002 .

[19]  P. Black,et al.  A Systematic Review of the Evidence of the Impact on Students, Teachers and the Curriculum of the Process of Using Assessment by Teachers for Summative Purposes , 2004 .

[20]  Caroline Gipps,et al.  Models of teacher assessment among primary school teachers in England , 1996 .

[21]  Choi Chee-Cheong Public Examinations in Hong Kong , 1999 .

[22]  G. Madaus,et al.  Comparing Teacher Assessment and Standard Task Results in England: The relationship between pupil characteristics and attainment , 1998 .

[23]  T. Christie,et al.  The Relationship between Teacher Assessments and Pupil Attainments in Standard Test Tasks at Key Stage 2, 1996–98 , 2001 .

[24]  Donna DiPrima Bickel,et al.  Trusting Teachers’ Judgments: A Validity Study of a Curriculum-Embedded Performance Assessment in Kindergarten to Grade 3 , 2001 .

[25]  Barbara Y. White,et al.  Designing Assessments for Instruction and Accountability: An Application of Validity Theory to Assessing Scientific Inquiry , 2004, Teachers College Record: The Voice of Scholarship in Education.

[26]  P. Black Testing: Friend or foe , 1998 .

[27]  Mary James Using assessment for school improvement , 1998 .

[28]  T. Coladarci Accuracy of teacher judgments of student responses to standardized test items. , 1986 .

[29]  Teachers’ assessments of primary children's classroom work in the creative arts , 1996 .

[30]  C. Sharpley,et al.  Teachers' Ratings vs. Standardized Tests: An Empirical Investigation of Agreement between Two Indices of Achievement. , 1986 .

[31]  Wynne Harlen Teaching, Learning and Assessing Science, 5-12. 3rd Edition. , 2000 .

[32]  R. Bennett,et al.  Influence of Behavior Perceptions and Gender on Teachers' Judgments of Students' Academic Skill. , 1993 .

[33]  Desmond L. Nuttall Assessing educational achievement , 1986 .

[34]  P. Black Formative and Summative Assessment by Teachers , 1993 .

[35]  M. Spear Sex bias in science teachers’ ratings of work and pupil characteristics‡ , 1984 .

[36]  M. R. Delap An investigation into the accuracy of A‐level predicted grades , 1994 .

[37]  T. Crooks The Impact of Classroom Evaluation Practices on Students , 1988 .

[38]  Chris R. Brown An Evaluation of Two Different Methods of Assessing Independent Investigations in an Operational Pre-University Level Examination in Biology in England. , 1998 .

[39]  R. Shavelson Performance Assessments: Political Rhetoric and Measurement Reality , 1992 .

[40]  R. Glaser,et al.  Knowing What Students Know: The Science and Design of Educational Assessment , 2001 .

[41]  A Study of Teacher Assessment at Key Stage 1 , 1997 .

[42]  M. R. Delap Teachers' Estimates of Candidates' Performances in Public Examinations , 1995 .

[43]  Wynne Harlen,et al.  Teaching, Learning and Assessing Science 5 - 12 , 2005 .

[44]  Patricia Broadfoot,et al.  Some Sink, Some Float: National Curriculum assessment and accountability , 1994 .

[45]  R. Shavelson,et al.  Research news and Comment: Performance Assessments , 1992 .

[46]  James W. Pellegrino,et al.  Knowing What Students Know. , 2003 .

[47]  K. Rowe,et al.  Assessing, Recording and Reporting Students’ Educational Progress: the case for ‘subject profiles’ , 1996 .

[48]  K. Ecclestone Learning in a comfort zone: cultural and social capital inside an outcome‐based assessment regime , 2004 .

[49]  G. Haus,et al.  The Accuracy of Teacher Judgment of the Oral Proficiency of High School Foreign Language Students. , 1987 .