Intra- and interobserver variability in CT measurements in oncology.

PURPOSE To assess variability of computed tomographic (CT) measurements of lesions of various sizes and margin sharpness in several organs taken by readers with different levels of experience, as would be found in routine clinical practice. MATERIALS AND METHODS In this institutional review board-approved, HIPAA-compliant retrospective study, 17 radiologists with varying levels of experience independently obtained bidimensional orthogonal axial measurements of 80 lymph nodes, 120 pulmonary lesions, and 120 hepatic lesions, categorized by size and margin sharpness. Repeat measurements were performed 2 or more weeks later. Intraclass correlation coefficients and Bland-Altman plots were used to assess intra- and interobserver variability. RESULTS For long- and short-axis measurements, respectively, overall intraobserver agreement rates were 0.957 (95% confidence interval [CI]: 0.947, 0.966) and 0.945 (95% CI: 0.933, 0.955); interobserver agreement rates were 0.954 (95% CI: 0.943, 0.963) and 0.941 (95% CI: 0.929, 0.951). Both intra- and interobserver agreement differed by lesion size, margin sharpness, location, and reader experience. Agreement ranged from 0.847 to 0.886 for lesions 20 mm or larger versus 0.745-0.785 for lesions smaller than 10 mm, 0.961 to 0.975 for smooth margins versus 0.924-0.942 for irregular margins, 0.955 to 0.97 for lung lesions versus 0.884-0.94 for lymph nodes, and 0.95 to 0.97 for attending radiologists versus 0.928-0.945 for fellows. Measurement variability decreased with increasing lesion size; 95% limits of agreement for short-axis measurements were -11.6% to 6.7% for lesions smaller than 10 mm versus -6.2% to 4.7% for lesions 20 mm or larger. CONCLUSION Overall intra- and interobserver variability rates were similar; in clinical practice, serial CT measurements can be safely performed by different radiologists. Smooth margins, larger lesion size, and greater reader experience resulted in a higher consistency of measurements. Depending on lesion size, increases of 4%-6% or greater in long axis and 5%-7% or greater in short axis and decreases of -6% to -10% or greater in long axis and -6% to -12% or greater in short axis at CT can be considered true changes rather than measurement variation, with 95% confidence.

[1]  Hiroyuki Ishikawa,et al.  Measurement of focal ground-glass opacity diameters on CT images: interobserver agreement in regard to identifying increases in the size of ground-glass opacities. , 2012, Academic radiology.

[2]  L. Schwartz,et al.  Reply to E.M. Gilles , 2012 .

[3]  E. Gilles Radiologic reviews: More second guessing from armchairs cannot lead the way. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  L. Schwartz,et al.  Variability of lung tumor measurements on repeat computed tomography scans taken within 15 minutes. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  W. Heindel,et al.  Measurement accuracy and reproducibility of semiautomated metric and volumetric lymph node analysis in MDCT. , 2010, AJR. American journal of roentgenology.

[6]  Binsheng Zhao,et al.  Evaluating variability in tumor measurements from same-day repeat CT scans of patients with non-small cell lung cancer. , 2009, Radiology.

[7]  L. Schwartz,et al.  New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). , 2009, European journal of cancer.

[8]  Binsheng Zhao,et al.  Pulmonary metastases: effect of CT section thickness on measurement--initial experience. , 2005, Radiology.

[9]  M. Revel,et al.  Are two-dimensional CT measurements of small noncalcified pulmonary nodules reliable? , 2004, Radiology.

[10]  Manish Mehta,et al.  Variability of maximal aortic aneurysm diameter measurements on CT scan: significance and methods to minimize. , 2004, Journal of vascular surgery.

[11]  L. Broemeling,et al.  Interobserver and intraobserver variability in measurement of non-small-cell carcinoma lung lesions: implications for assessment of tumor response. , 2003, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  Elie Mousseaux,et al.  Aneurysmal sizing after endovascular repair in patients with abdominal aortic aneurysm: interobserver variability of various measurement protocols and its clinical relevance , 2003, European Radiology.

[13]  W. Mali,et al.  Inter- and intraobserver variability of CT measurements obtained after endovascular repair of abdominal aortic aneurysms. , 2000, AJR. American journal of roentgenology.

[14]  Binsheng Zhao,et al.  Small pulmonary nodules: volumetrically determined growth rates based on CT evaluation. , 2000, Radiology.

[15]  W. Heindel,et al.  Spiral CT of pulmonary nodules: interobserver variation in assessment of lesion size , 2000, European Radiology.

[16]  B. Escudier,et al.  Response rate accuracy in oncology trials: reasons for interobserver variability. Groupe Français d'Immunothérapie of the Fédération Nationale des Centres de Lutte Contre le Cancer. , 1997, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[17]  K. Hopper,et al.  Analysis of interobserver and intraobserver variability in CT tumor measurements. , 1996, AJR. American journal of roentgenology.

[18]  D J Ballard,et al.  Variability in measurement of abdominal aortic aneurysms. Abdominal Aortic Aneurysm Detection and Management Veterans Administration Cooperative Study Group. , 1995, Journal of vascular surgery.

[19]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.