Reliability in evidence-based clinical practice: a primer for allied health professionals☆

Abstract The aim of this paper is to provide a tutorial on reliability in research and clinical practice. Reliability is defined as the quality of a measure that produces reproducible scores on repeat administrations of a test. Reliability is thus a prerequisite for test validity. All measurements are attended by measurement error. Systematic bias is a non-random change between trials in a test retest situation. Random error is the ‘noise' in the measurement or test. Systematic bias should be evaluated separately from estimates of random error. For variables measured on an interval-ratio scale the most appropriate estimates of random error are the typical error, the percent coefficient of variation, and the 95% limits of agreement. These can be derived via analysis of variance procedures. Estimates of relative, rather than absolute, reliability may be obtained from the intraclass correlation coefficient. For variables that have categories as values the kappa coefficient is recommended. Irrespective of the statistic chosen, 95% confidence intervals should be reported to define the range of values within which the true population value is likely to reside. Small random error implies greater precision for single trials. More precise tests and measurements facilitate more sensitive monitoring of the effects of treatment interventions in research or practice settings.

[1]  A. Nevill,et al.  Assessing agreement between measurements recorded on a ratio scale in sports medicine and sports science. , 1997, British journal of sports medicine.

[2]  Martin Bland,et al.  An Introduction to Medical Statistics , 1987 .

[3]  Ted A. Baumgartner,et al.  Measurement for Evaluation in Physical Education and Exercise Science , 1987 .

[4]  V. Seagroatt An introduction to medical statistics (2nd ed.) , 1996 .

[5]  D. Altman,et al.  A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. , 1990, Computers in biology and medicine.

[6]  M. L. Nelson,et al.  Design concepts in nutritional epidemiology , 1997 .

[7]  J. Bartko The Intraclass Correlation Coefficient as a Measure of Reliability , 1966, Psychological reports.

[8]  M. Nelson 8. The validation of dietary assessment , 1997 .

[9]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[10]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .

[11]  G Atkinson,et al.  Statistical Methods For Assessing Measurement Error (Reliability) in Variables Relevant to Sports Medicine , 1998, Sports medicine.

[12]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[13]  J R Morrow,et al.  How "significant" is your reliability? , 1993, Research quarterly for exercise and sport.

[14]  W H Eisma,et al.  The Timed "up and go" test: reliability and validity in persons with unilateral lower limb amputation. , 1999, Archives of physical medicine and rehabilitation.