Assessment of Interobserver Reproducibility in Quantitative 18F-FDG PET and CT Measurements of Tumor Response to Therapy

Our goal was to estimate and compare across different readers the reproducibility of the 18F-FDG PET standardized uptake value (SUV) and CT size measurements, and changes in those measurements, in malignant tumors before and after therapy. Methods: Fifty-two tumors in 25 patients were evaluated on 18F-FDG PET/CT scans. Maximum SUVs (SUVbw max) and CT size measurements were determined for each tumor independently on pre- and posttreatment scans by 8 different readers (4 PET, 4 CT) using routine nonautomated clinical methods. Percentage changes in SUVbw max and CT size between pre- and posttreatment scans were calculated. Interobserver reproducibility of SUVbw max, CT size, and changes in these values were described by intraclass correlation coefficients (ICCs) and estimates of variance. Results: The ICC was higher for the pretreatment, posttreatment, and percentage change in SUVbw max than the ICC for the longest CT size and the 2-dimensional CT size (before treatment, 0.93, 0.72, and 0.61, respectively; after treatment, 0.91, 0.85, and 0.45, respectively; and percentage change, 0.94, 0.70, and 0.33, respectively). The variability of SUVbw max was significantly lower than the variability of the longest CT size and the 2-dimensional CT size (mean ± SD before treatment, 6.3% ± 14.2%, 16.2% ± 17.8%, and 27.5% ± 26.7%, respectively, P ≤ 0.001; and after treatment, 18.4% ± 26.8%, 35.1% ± 47.5%, and 50.9% ± 51.4%, respectively, P ≤ 0.02). The variability of percentage change in SUVbw max (16.7% ± 36.2%) was significantly lower than that for percentage change in the longest CT size (156.3% ± 157.3%, P ≤ 0.0001) and the 2-dimensional CT size (178.4% ± 546.5%, P < 0.0001). Conclusion: The interobserver reproducibility of SUVbw max for both untreated and treated tumors and percentage change in SUVbw max are substantially higher than measurements of CT size and percentage change in CT size. Measurements of tumor metabolism by PET should be included in trials to assess response to therapy. Although PET reproducibility was high, the variability observed in analyses of identical image sets by 4 readers indicates that automated analytic tools to assess response might be helpful to further enhance reproducibility.

[1]  R. Wahl,et al.  Insulin-induced hypoglycemia decreases uptake of 2-[F-18]fluoro-2-deoxy-D-glucose into experimental mammary carcinoma. , 1997, Radiology.

[2]  Claude Nahmias,et al.  Reproducibility of Standardized Uptake Value Measurements Determined by 18F-FDG PET in Malignant Tumors , 2008, Journal of Nuclear Medicine.

[3]  J. Coya,et al.  PET/CT in lymphoma: prospective study of enhanced full-dose PET/CT versus unenhanced low-dose PET/CT. , 2006, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[4]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[5]  Adriaan A. Lammertsma,et al.  Effects of ROI definition and reconstruction method on quantitative outcome and applicability in a response monitoring trial , 2005, European Journal of Nuclear Medicine and Molecular Imaging.

[6]  A. Miller,et al.  Reporting results of cancer treatment , 1981, Cancer.

[7]  Yuji Nakamoto,et al.  Reproducibility of common semi-quantitative parameters for evaluating lung cancer glucose metabolism with positron emission tomography using 2-deoxy-2-[18F]fluoro-D-glucose. , 2002, Molecular imaging and biology : MIB : the official publication of the Academy of Molecular Imaging.

[8]  Mathias Prokop,et al.  Pulmonary nodules detected at lung cancer screening: interobserver variability of semiautomated volume measurements. , 2006, Radiology.

[9]  Thomas F Hany,et al.  Non-Hodgkin lymphoma and Hodgkin disease: coregistered FDG PET and CT at staging and restaging--do we need contrast-enhanced CT? , 2004, Radiology.

[10]  L. Washington,et al.  Inherent variability of CT lung nodule measurements in vivo using semiautomated volumetric measurements. , 2006, AJR. American journal of roentgenology.

[11]  Bernard J. Ransil,et al.  Reproducibility of linear tumor measurements using PACS: comparison of caliper method with edge-tracing method , 2004, European Radiology.

[12]  C. Catalano,et al.  Volumetric evaluation of therapy response in patients with lung metastases. Preliminary results with a computer system (CAD) and comparison with unidimensional measurements , 2006, La radiologia medica.

[13]  L. Schwartz,et al.  Evaluation of tumor measurements in oncology: use of film-based and electronic techniques. , 2000, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[14]  Mithat Gonen,et al.  Clinical implications of different image reconstruction parameters for interpretation of whole-body PET studies in cancer patients. , 2004, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[15]  Jason P Fine,et al.  Influence of reconstruction iterations on 18F-FDG PET/CT standardized uptake values. , 2005, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[16]  M. van Glabbeke,et al.  New guidelines to evaluate the response to treatment in solid tumors , 2000, Journal of the National Cancer Institute.

[17]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[18]  M Schwaiger,et al.  Reproducibility of metabolic measurements in malignant tumors using FDG PET. , 1999, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[19]  I. Tannock,et al.  Influence of measurement error on assessment of response to anticancer chemotherapy: proposal for new criteria of tumor response. , 1984, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[20]  R L Wahl,et al.  Lung cancer: reproducibility of quantitative measurements for evaluating 2-[F-18]-fluoro-2-deoxy-D-glucose uptake at PET. , 1995, Radiology.

[21]  W. Heindel,et al.  Spiral CT of pulmonary nodules: interobserver variation in assessment of lesion size , 2000, European Radiology.

[22]  Michael E Phelps,et al.  Treatment Monitoring by 18F-FDG PET/CT in Patients with Sarcomas: Interobserver Variability of Quantitative Parameters in Treatment-Induced Changes in Histopathologically Responding and Nonresponding Tumors , 2008, Journal of Nuclear Medicine.

[23]  L M Hamberg,et al.  The dose uptake ratio as an index of glucose metabolism: useful parameter or oversimplification? , 1994, Journal of nuclear medicine : official publication, Society of Nuclear Medicine.

[24]  L. Broemeling,et al.  Interobserver and Intraobserver Variability of Standardized Uptake Value Measurements in Non–small-cell Lung Cancer , 2006, Journal of thoracic imaging.

[25]  F. d'Amore,et al.  Early interim 2-[18F]fluoro-2-deoxy-D-glucose positron emission tomography is prognostically superior to international prognostic score in advanced-stage Hodgkin's lymphoma: a report from a joint Italian-Danish study. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[26]  L. Schwartz,et al.  New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). , 2009, European journal of cancer.

[27]  R L Wahl,et al.  Standardized uptake values of normal tissues at PET with 2-[fluorine-18]-fluoro-2-deoxy-D-glucose: variations with body weight and a method for correction. , 1993, Radiology.

[28]  R. Wahl,et al.  From RECIST to PERCIST: Evolving Considerations for PET Response Criteria in Solid Tumors , 2009, Journal of Nuclear Medicine.

[29]  R L Wahl,et al.  Reevaluation of the standardized uptake value for FDG: variations with body weight and methods for correction. , 1999, Radiology.

[30]  M. Revel,et al.  Are two-dimensional CT measurements of small noncalcified pulmonary nodules reliable? , 2004, Radiology.

[31]  K. Herholz,et al.  Measurement of clinical and subclinical tumour response using [18F]-fluorodeoxyglucose and positron emission tomography: review and 1999 EORTC recommendations. European Organization for Research and Treatment of Cancer (EORTC) PET Study Group. , 1999, European journal of cancer.

[32]  Alicia Y Toledano,et al.  Interobserver reliability of computed tomography‐derived primary tumor volume measurement in patients with supraglottic carcinoma , 2005, Cancer.

[33]  M. Schwaiger,et al.  Positron emission tomography in non-small-cell lung cancer: prediction of response to chemotherapy by quantitative assessment of glucose use. , 2003, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[34]  M Van Glabbeke,et al.  New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. , 2000, Journal of the National Cancer Institute.

[35]  Sigrid Stroobants,et al.  Revised response criteria for malignant lymphoma. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[36]  L. Broemeling,et al.  Interobserver and intraobserver variability in measurement of non-small-cell carcinoma lung lesions: implications for assessment of tumor response. , 2003, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[37]  R L Wahl,et al.  Metabolic monitoring of breast cancer chemohormonotherapy using positron emission tomography: initial evaluation. , 1993, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.