The Reliability of Observational Measures.

Researchers in the area of classroom observation have been greatly troubled by questions concerning the reliability of the measures that they obtain. Until recently, this concern was frequently assuaged by the routine computation of one or more coefficients of observer agreement (see Frick & Semmel, 1974). However, the work of Medley and Mitzel (1963) and McGaw, Wardrop and Bunda (1972) has clearly established the inadequacy of observer agreements alone as indices of reliability. The variance components approach which they propose enables the researcher to pinpoint multiple sources of error, and to compute a number of different reliability coefficients for different purposes. Unfortunately, the literature does not indicate that these methods have gained wide acceptance, at least not in practice. The most likely reason for this would appear to be the inference from both papers that the estimation of reliability properly requires a fully-fledged reliability study, using multiple observers fully crossed with classrooms, and (following McGaw et al., 1972) crossed also with situations. The magnitude of such a study is far beyond the resources of most researchers, nor does such an undertaking relate very closely to the purposes of their own studies (typically, to make some statement about teacher or pupil behavior, and possibly its relationship with educational outcomes). Consequently, it has been common to avoid the question of reliability altogether, or else to report a coefficient of observer agreement, knowing full well its inadequacy for that purpose. It has been urged (Herbert & Attridge, 1975) that users and developers of observation systems ought to provide data pertaining to reliability, and, as well, ". .. a discussion of which reliability measures were selected, and why (p. 14)." A full-scale reliability study, along the lines of McGaw et al (1972) or Medley and Mitzel (1963) can be, and probably ought to be, demanded of the developer of an