Improving Teacher Selection: The Effect of Inter-Rater Reliability in the Screening Process. CEDR Working Paper. WP #2015-7.

Inter-rater reliability, commonly assessed by intra-class correlation coefficient ICC, is an important index for describing the extent to which there is consistency amongst two or more raters in assigned measures. In organizational research, the data structure is often hierarchical and designs deviate substantially from the ideal of a balanced (fully crossed or nested) design. Also, often it is necessary to include covariates in the model, making it impossible to use traditional correlation-based or analysis of variance (ANOVA)-based methods for estimation of inter-rater reliability. We advocate the use of hierarchical (mixed effect model)-based methods, where variance components can be estimated by restricted maximum likelihood or Bayesian approaches. In this work, we use data from teacher hiring in Spokane public schools to demonstrate the usage of hierarchical (mixed effect) models to estimate inter-rater reliability and to demonstrate how reliability can be estimated with more complex data structures. We generally find low levels of inter-rater reliability, though this overall reliability varies according to whether the measure is assessed across or within schools, or even within job openings. We also find evidence that inter-rater reliability of some subcomponents of the hiring rubric varies according to the type of position to which applicants are applying, or according to whether the applicant is internal or from outside of the district. The direct effect of reliability on predictive power of the selection instrument is demonstrated and policy implications for public school hiring are discussed.

[1]  Anders Jonsson,et al.  The use of scoring rubrics: Reliability, validity, and educational consequences , 2007 .

[2]  Deborah F. Goodman,et al.  A meta-analysis of interrater and internal consistency reliability of selection interviews. , 1995 .

[3]  Jonah E. Rockoff,et al.  provided that full credit, including © notice, is given to the source. Can You Recognize an Effective Teacher When You Recruit One? , 2008 .

[4]  E. Hanushek,et al.  Teachers, Schools, and Academic Achievement , 1998 .

[5]  Edward H. Haertel,et al.  4 Reliability Coefficients and Generalizability Theory , 2006 .

[6]  Dan J. Putka,et al.  Clarifying the contribution of assessee-, dimension-, exercise-, and assessor-related effects to reliable and unreliable variance in assessment center ratings. , 2013, The Journal of applied psychology.

[7]  S. Metzger,et al.  Commercial Teacher Selection Instruments: The Validity of Selecting Teachers Through Beliefs, Attitudes, and Values , 2008 .

[8]  J. Nunnally Psychometric Theory (2nd ed), New York: McGraw-Hill. , 1978 .

[9]  Jason A. Grissom,et al.  Do Strong Unions Shape District Policies? , 2010 .

[10]  Andrew D. Ho,et al.  The Reliability of Classroom Observations by School Personnel. Research Paper. MET Project. , 2013 .

[11]  Shinichi Nakagawa,et al.  Repeatability for Gaussian and non‐Gaussian data: a practical guide for biologists , 2010, Biological reviews of the Cambridge Philosophical Society.

[12]  I. Young,et al.  The Validity of the Teacher Perceiver Interview for Predicting Performance of Classroom Teachers , 2002 .

[13]  C. Lance,et al.  The Sources of Four Commonly Reported Cutoff Criteria , 2006 .

[14]  Gary Barnes,et al.  The Cost of Teacher Turnover in Five School Districts: A Pilot Study. , 2007 .

[15]  Katharine O. Strunk Are Teachers' Unions Really to Blame? Collective Bargaining Agreements and Their Relationships with District Resource Allocation and Student Performance in California , 2011, Education Finance and Policy.

[16]  D. Brewer,et al.  A Three-way Error Components Analysis of Educational Productivity , 1999 .

[17]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[18]  Charalambos Y. Charalambous,et al.  When rater reliability is not enough: Teacher observation systems and a case for the G-study , 2012 .

[19]  L. Hedges,et al.  How Large Are Teacher Effects? , 2004 .

[20]  Douglas N. Harris,et al.  Mix and Match: What Principals Really Look for When Hiring Teachers , 2010, Education Finance and Policy.

[21]  Will Dobbie,et al.  Teacher Characteristics and Student Achievement : Evidence from Teach For America , 2011 .

[22]  G. Schwarz Estimating the Dimension of a Model , 1978 .