How reliable are students’ evaluations of teaching quality? A variance components approach

Abstract The inter-rater reliability of university students’ evaluations of teaching quality was examined with cross-classified multilevel models. Students (N = 480) evaluated lectures and seminars over three years with a standardised evaluation questionnaire, yielding 4224 data points. The total variance of these student evaluations was separated into the variance components of courses, teachers, students and the student/teacher interaction. The substantial variance components of teachers and courses suggest reliability. However, a similar proportion of variance was due to students, and the interaction of students and teachers was the strongest source of variance. Students’ individual perceptions of teaching and the fit of these perceptions with the particular teacher greatly influence their evaluations. This casts some doubt on the validity of student evaluations as indicators of teaching quality and suggests that aggregated evaluation scores should be used with caution.

[1]  M. Leamon,et al.  Measuring Teaching Effectiveness in a Pre-Clinical Multi-Instructor Course: A Case Study in the Development and Application of a Brief Instructor Rating Scale , 2005, Teaching and learning in medicine.

[2]  H. Marsh Students ' Evaluations of University Teaching : Dimensionality , Reliability , Validity , Potential Biases , and Utility , 2005 .

[3]  Dennis E. Clayson,et al.  Personality and the Student Evaluation of Teaching , 2006 .

[4]  John Hattie,et al.  The Relationship Between Research and Teaching: A Meta-Analysis , 1996 .

[5]  H. Marsh Students’ Evaluations of University Teaching: Dimensionality, Reliability, Validity, Potential Biases and Usefulness , 1984 .

[6]  T. Staufenbiel Fragebogen zur Evaluation von universitären Lehrveranstaltungen durch Studierende und Lehrende , 2000 .

[7]  William J. Browne,et al.  Non-Hierarchical Multilevel Models , 2008 .

[8]  T. Staufenbiel,et al.  Prädiktoren studentischer Lehrveranstaltungsevaluationen , 2016 .

[9]  Raymond P. Perry,et al.  The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective , 2007 .

[10]  W. Brown SOME EXPERIMENTAL RESULTS IN THE CORRELATION OF MENTAL ABILITIES1 , 1910 .

[11]  Curt J. Dommeyer,et al.  College Students' Attitudes Toward Methods of Collecting Teaching Evaluations: In-Class Versus On-Line , 2002 .

[12]  Herbert W. Marsh,et al.  The Use of Path Analysis to Estimate Teacher and Course Effects in Student Ratings of Instructional Effectiveness , 1982 .

[13]  S. Basow,et al.  The effects of professors’ race and gender on student evaluations and performance , 2013 .

[14]  H. Rindermann,et al.  Generalizability of Multidimensional Student Ratings of University Instruction Across Courses and Teachers , 2001 .

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[17]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[18]  Ulrich Trautwein,et al.  Please Scroll down for Article Structural Equation Modeling: a Multidisciplinary Journal Exploratory Structural Equation Modeling, Integrating Cfa and Efa: Application to Students' Evaluations of University Teaching , 2022 .

[19]  D. A. Kenny,et al.  Interpersonal Perception: A Social Relations Analysis , 1988 .

[20]  C. Spearman CORRELATION CALCULATED FROM FAULTY DATA , 1910 .

[21]  S. Barrie,et al.  Students’ perceptions of teaching quality in higher education: the perspective of currently enrolled students , 2007 .

[22]  Tobias Richter,et al.  What Is Wrong With ANOVA and Multiple Regression? Analyzing Sentence Reading Times With Hierarchical Linear Models , 2006 .

[23]  H. Campbell,et al.  What's looks got to do with it? Instructor appearance and student evaluations of teaching , 2005 .

[24]  D. DiPette,et al.  The Reliability of Medical Student Ratings of Clinical Teaching , 1997, Evaluation & the health professions.

[25]  Michael T. Kane,et al.  The Generalizability of Student Ratings of Instruction: Estimation of the Teacher and Course Components. , 1978 .

[26]  Carol L. Patrick Student evaluations of teaching: effects of the Big Five personality traits, grades and the validity hypothesis , 2011 .

[27]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[28]  Harvey Goldstein,et al.  Handbook of multilevel analysis , 2008 .

[29]  Pieter Spooren,et al.  On the credibility of the judge: A cross-classified multilevel analysis on students’ evaluation of teaching , 2010 .

[30]  Ross A. Thompson,et al.  Prior Knowledge and Its Relevance to Student Achievement in Introduction to Psychology , 2003 .

[31]  Shannon K. Gilmartin,et al.  Assessing Response Rates and Nonresponse Bias in Web and Paper Surveys , 2003 .

[32]  Pekka Rantanen,et al.  The number of feedbacks needed for reliable evaluation. A multilevel analysis of the reliability, stability and generalisability of students’ evaluation of teaching , 2013 .

[33]  Tobias Wolbring,et al.  How beauty works. Theoretical mechanisms and two empirical applications on students' evaluation of teaching. , 2016, Social science research.

[34]  Dineke E.H. Tigelaar,et al.  The development and validation of a framework for teaching competencies in higher education , 2004 .

[35]  H. Marsh,et al.  Making students' evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. , 1997 .

[36]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .