Guided undersampling classification for automated radiation therapy quality assurance of prostate cancer treatment

PURPOSE To test the use of well-studied and widely used classification methods alongside newly developed data-filtering techniques specifically designed for imbalanced-data classification in order to demonstrate proof of principle for an automated radiation therapy (RT) quality assurance process on prostate cancer treatment. METHODS A series of acceptable (majority class, n = 61) and erroneous (minority class, n = 12) RT plans as well as a disjoint set of acceptable plans used to develop features (n = 273) were used to develop a dataset for testing. A series of five widely used imbalanced-data classification algorithms were tested with a modularized guided undersampling procedure that includes ensemble-outlier filtering and normalized-cut sampling. RESULTS Hybrid methods including either ensemble-outlier filtering or both filtering and normalized-cut sampling yielded the strongest performance in identifying unacceptable treatment plans. Specifically, five methods demonstrated superior performance in both area under the receiver operating characteristics curve and false positive rate when the true positive rate is equal to one. Furthermore, ensemble-outlier filtering significantly improved results in all but one hybrid method (p < 0.01). Finally, ensemble-outlier filtering methods identified four minority instances that were considered outliers in over 96% of cross-validation iterations. Such instances may be considered distinct planning errors and merit additional inspection, providing potential areas of improvement for the planning process. CONCLUSIONS Traditional imbalanced-data classification methods combined with ensemble-outlier filtering and normalized-cut sampling provide a powerful framework for identifying erroneous RT treatment plans. The proposed methodology yielded strong classification performance and identified problematic instances with high accuracy.

[1]  Sasa Mutic,et al.  Automated contouring error detection based on supervised geometric attribute distribution models for radiation therapy: a general strategy. , 2015, Medical physics.

[2]  B. Lang,et al.  Efficient optimization of support vector machine learning parameters for unbalanced datasets , 2006 .

[3]  Thomas G. Purdie,et al.  Contextual Atlas Regression Forests: Multiple-Atlas-Based Automated Dose Prediction in Radiation Therapy , 2016, IEEE Transactions on Medical Imaging.

[4]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[5]  Noureddine Ellouze,et al.  Practical Selection of SVM Supervised Parameters with Different Feature Representations for Vowel Recognition , 2015, ArXiv.

[6]  Nuno Vasconcelos,et al.  Risk minimization, probability elicitation, and cost-sensitive SVMs , 2010, ICML.

[7]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[10]  Deshan Yang,et al.  Automated radiotherapy treatment plan integrity verification. , 2012, Medical physics.

[11]  Indra J Das,et al.  Analysis of treatment planning time among systems and planners for intensity-modulated radiation therapy. , 2009, Journal of the American College of Radiology : JACR.

[12]  David Kaeli,et al.  Towards the development of an error checker for radiotherapy treatment plans: a preliminary study. , 2007, Physics in medicine and biology.

[13]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[14]  Issam El Naqa,et al.  Detection and Prediction of Radiotherapy Errors , 2015 .

[15]  Max Dahele,et al.  Can knowledge-based DVH predictions be used for automated, individualized quality assurance of radiotherapy treatment plans? , 2015, Radiation Oncology.

[16]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[17]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[18]  Jean-Pierre Bissonnette,et al.  Trend analysis of radiation therapy incidents over seven years. , 2010, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[19]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[20]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[21]  Eric C Ford,et al.  Bayesian network models for error detection in radiotherapy plans , 2015, Physics in medicine and biology.

[22]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[23]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[24]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[25]  Christopher G Willett,et al.  Failure to adhere to protocol specified radiation therapy guidelines was associated with decreased survival in RTOG 9704--a phase III trial of adjuvant chemotherapy and chemoradiotherapy for patients with resected adenocarcinoma of the pancreas. , 2012, International journal of radiation oncology, biology, physics.

[26]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[27]  Thomas G. Purdie,et al.  Groupwise Conditional Random Forests for Automatic Shape Classification and Contour Quality Assessment in Radiotherapy Planning , 2013, IEEE Transactions on Medical Imaging.

[28]  D. Low,et al.  Experience-based quality control of clinical intensity-modulated radiotherapy planning. , 2011, International Journal of Radiation Oncology, Biology, Physics.

[29]  Oren Etzioni,et al.  Representation design and brute-force induction in a Boeing manufacturing domain , 1994, Appl. Artif. Intell..

[30]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[32]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[33]  John P. Gibbons,et al.  Khan's The Physics of Radiation Therapy , 2014 .

[34]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[35]  P. Lambin,et al.  Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital - A real life proof of concept. , 2016, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[36]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[37]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[38]  Jean-Pierre Bissonnette,et al.  Error in the delivery of radiation therapy: results of a quality assurance review. , 2005, International journal of radiation oncology, biology, physics.

[39]  Brian O'Sullivan,et al.  Critical impact of radiotherapy protocol compliance and quality in the treatment of advanced head and neck cancer: results from TROG 02.02. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[40]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[41]  S Mutic,et al.  A framework for automated contour quality assurance in radiation therapy including adaptive techniques , 2015, Physics in medicine and biology.

[42]  Dean V. Neubauer,et al.  Acceptance Sampling in Quality Control , 1983 .

[43]  L. Zhuang,et al.  Parameter optimization of Kernel-based one-class classifier on imbalance text learning , 2006 .

[44]  Louis B. Harrison,et al.  Automating the initial physics chart‐checking process , 2009, Journal of applied clinical medical physics.

[45]  Andrea L McNiven,et al.  A new metric for assessing IMRT modulation complexity and plan deliverability. , 2010, Medical physics.