Pitfalls and Important Issues in Testing Reliability Using Intraclass Correlation Coefficients in Orthopaedic Research

Background Intra-class correlation coefficients (ICCs) provide a statistical means of testing the reliability. However, their interpretation is not well documented in the orthopedic field. The purpose of this study was to investigate the use of ICCs in the orthopedic literature and to demonstrate pitfalls regarding their use. Methods First, orthopedic articles that used ICCs were retrieved from the Pubmed database, and journal demography, ICC models and concurrent statistics used were evaluated. Second, reliability test was performed on three common physical examinations in cerebral palsy, namely, the Thomas test, the Staheli test, and popliteal angle measurement. Thirty patients were assessed by three orthopedic surgeons to explore the statistical methods testing reliability. Third, the factors affecting the ICC values were examined by simulating the data sets based on the physical examination data where the ranges, slopes, and interobserver variability were modified. Results Of the 92 orthopedic articles identified, 58 articles (63%) did not clarify the ICC model used, and only 5 articles (5%) described all models, types, and measures. In reliability testing, although the popliteal angle showed a larger mean absolute difference than the Thomas test and the Staheli test, the ICC of popliteal angle was higher, which was believed to be contrary to the context of measurement. In addition, the ICC values were affected by the model, type, and measures used. In simulated data sets, the ICC showed higher values when the range of data sets were larger, the slopes of the data sets were parallel, and the interobserver variability was smaller. Conclusions Care should be taken when interpreting the absolute ICC values, i.e., a higher ICC does not necessarily mean less variability because the ICC values can also be affected by various factors. The authors recommend that researchers clarify ICC models used and ICC values are interpreted in the context of measurement.

[1]  Walter A. Hendricks,et al.  The Sampling Distribution of the Coefficient of Variation , 1936 .

[2]  Z. A. Lomnicki The Standard Error of Gini's Mean Difference , 1952 .

[3]  Diseases of the Hip, Knee, and Ankle Joints , 1991 .

[4]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[5]  The classic. Diseases of the hip, knee and ankle joint with their deformities treated by a new and efficient method. , 1974, Clinical orthopaedics and related research.

[6]  The Prone Hip Extension Test: A Method of Measuring Hip Flexion Deformity , 1977, Clinical orthopaedics and related research.

[7]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[8]  M. B. Kelly,et al.  A review of the observational data-collection and reliability procedures reported in The Journal of Applied Behavior Analysis. , 1977, Journal of applied behavior analysis.

[9]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[10]  R. J. Hunt,et al.  Percent Agreement, Pearson's Correlation, and Kappa as Measures of Inter-examiner Reliability , 1986, Journal of dental research.

[11]  Rebecca S. Graves,et al.  Users' Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. , 2002 .

[12]  Shameem Nyla NATIONAL COUNCIL ON MEASUREMENT IN EDUCATION , 2004 .

[13]  Alija Kulenović,et al.  Standards for Educational and Psychological Testing , 1999 .

[14]  M. Schwartz,et al.  The identification and treatment of gait problems in cerebral palsy , 2009 .

[15]  Derek J Donegan,et al.  Intraobserver and interobserver agreement in the measurement of displaced humeral medial epicondyle fractures in children. , 2010, The Journal of bone and joint surgery. American volume.

[16]  K. Lee,et al.  Reliability of physical examination in the measurement of hip flexion contracture and correlation with gait parameters in cerebral palsy. , 2011, The Journal of bone and joint surgery. American volume.

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .