Machine learning for improved pathological staging of prostate cancer: A performance comparison on a range of classifiers

OBJECTIVES Prediction of prostate cancer pathological stage is an essential step in a patient's pathway. It determines the treatment that will be applied further. In current practice, urologists use the pathological stage predictions provided in Partin tables to support their decisions. However, Partin tables are based on logistic regression (LR) and built from US data. Our objective is to investigate a range of both predictive methods and of predictive variables for pathological stage prediction and assess them with respect to their predictive quality based on U.K. data. METHODS AND MATERIAL The latest version of Partin tables was applied to a large scale British dataset in order to measure their performances by mean of concordance index (c-index). The data was collected by the British Association of Urological Surgeons (BAUS) and gathered records from over 1700 patients treated with prostatectomy in 57 centers across UK. The original methodology was replicated using the BAUS dataset and evaluated using concordance index. In addition, a selection of classifiers, including, among others, LR, artificial neural networks and Bayesian networks (BNs) was applied to the same data and compared with each other using the area under the ROC curve (AUC). Subsets of the data were created in order to observe how classifiers perform with the inclusion of extra variables. Finally a local dataset prepared by the Aberdeen Royal Infirmary was used to study the effect on predictive performance of using different variables. RESULTS Partin tables have low predictive quality (c-index=0.602) when applied on UK data for comparison on patients with organ confined and extra prostatic extension conditions, patients at the two most frequently observed pathological stages. The use of replicate lookup tables built from British data shows an improvement in the classification, but the overall predictive quality remains low (c-index=0.610). Comparing a range of classifiers shows that BNs generally outperform other methods. Using the four variables from Partin tables, naive Bayes is the best classifier for the prediction of each class label (AUC=0.662 for OC). When two additional variables are added, the results of LR (0.675), artificial neural networks (0.656) and BN methods (0.679) are overall improved. BNs show higher AUCs than the other methods when the number of variables raises CONCLUSION The predictive quality of Partin tables can be described as low to moderate on U.K. data. This means that following the predictions generated by Partin tables, many patients would received an inappropriate treatment, generally associated with a deterioration of their quality of life. In addition to demographic differences between U.K. and the original U.S. population, the methodology and in particular LR present limitations. BN represents a promising alternative to LR from which prostate cancer staging can benefit. Heuristic search for structure learning and the inclusion of more variables are elements that further improve BN models quality.

[1]  Shiro Baba,et al.  Artificial neural network analysis for predicting pathological stage of clinically localized prostate cancer in the Japanese population. , 2002, Japanese journal of clinical oncology.

[2]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  A W Partin,et al.  Combination of prostate-specific antigen, clinical stage, and Gleason score to predict pathological stage of localized prostate cancer. A multi-institutional update. , 1997, JAMA.

[5]  G. Sanz,et al.  The use of neural networks and logistic regression analysis for predicting pathological stage in men undergoing radical prostatectomy: a population based study. , 2001, The Journal of urology.

[6]  Pierre I Karakiewicz,et al.  Comparison of accuracy between the Partin tables of 1997 and 2001 to predict final pathological stage in clinically localized prostate cancer. , 2004, The Journal of urology.

[7]  Alan W Partin,et al.  Updated nomogram to predict pathologic stage of prostate cancer given prostate-specific antigen level, clinical stage, and biopsy Gleason score (Partin tables) based on cases from 2000 to 2005. , 2007, Urology.

[8]  George C. Runger,et al.  Bias of Importance Measures for Multi-valued Attributes and Solutions , 2011, ICANN.

[9]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[10]  Sergio A. Alvarez,et al.  Machine learning of clinical performance in a pancreatic cancer database , 2010, Artif. Intell. Medicine.

[11]  D De Ruysscher,et al.  Comparison of Bayesian network and support vector machine models for two-year survival prediction in lung cancer patients treated with radiotherapy. , 2010, Medical physics.

[12]  Michael McCormack,et al.  Validation of 1997 Partin Tables' lymph node invasion predictions in men treated with radical prostatectomy in Montreal Quebec. , 2005, The Canadian journal of urology.

[13]  Pierre I Karakiewicz,et al.  Can predictive models for prostate cancer patients derived in the United States of America be utilized in European patients? A validation study of the Partin tables. , 2003, European urology.

[14]  Kazutaka Saito,et al.  Development, validation, and head-to-head comparison of logistic regression-based nomograms and artificial neural network models predicting prostate cancer on initial extended biopsy. , 2008, European urology.

[15]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[16]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[17]  M W Kattan,et al.  Evaluation of a Nomogram used to predict the pathologic stage of clinically localized prostate carcinoma , 1997, Cancer.

[18]  Naren Ramakrishnan,et al.  Mining Electronic Health Records , 2010, Computer.

[19]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[20]  John A. W. McCall,et al.  Evolved Bayesian Network models of rig operations in the gulf of Mexico , 2010, IEEE Congress on Evolutionary Computation.

[21]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[22]  Geoffrey I. Webb,et al.  On the effect of data set size on bias and variance in classification learning , 1999 .

[23]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[24]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[25]  James M Henning,et al.  How well does the Partin nomogram predict pathological stage after radical prostatectomy in a community based population? Results of the cancer of the prostate strategic urological research endeavor. , 2002, The Journal of urology.

[26]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[27]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[28]  Michael W Kattan,et al.  Validating a prognostic model , 2006, Cancer.

[29]  D. Chan,et al.  The use of prostate specific antigen, clinical stage and Gleason score to predict pathological stage in men with localized prostate cancer. , 1993, The Journal of urology.

[30]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[31]  John A. W. McCall,et al.  A chain-model genetic algorithm for Bayesian network structure learning , 2007, GECCO '07.

[32]  Nadeem Shaida,et al.  Open versus laparoscopic radical prostatectomy. The case for open radical prostatectomy. , 2007, Annals of the Royal College of Surgeons of England.

[33]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[34]  R. W. Robinson Counting unlabeled acyclic digraphs , 1977 .

[35]  Umberto Capitanio,et al.  External validation of the updated partin tables in a cohort of French and Italian men. , 2009, International journal of radiation oncology, biology, physics.

[36]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[37]  Bart De Moor,et al.  Using literature and data to learn Bayesian networks as clinical models of ovarian tumors , 2004, Artif. Intell. Medicine.

[38]  John A. W. McCall,et al.  Two novel Ant Colony Optimization approaches for Bayesian network structure learning , 2010, IEEE Congress on Evolutionary Computation.

[39]  Chuanliang Xu,et al.  The newer the better? Comparison of the 1997 and 2001 partin tables for pathologic stage prediction of prostate cancer in China. , 2008, Urology.

[40]  Feng Luan,et al.  Diagnosing Breast Cancer Based on Support Vector Machines. , 2003 .

[41]  A. Partin,et al.  Evaluation of artificial neural networks for the prediction of pathologic stage in prostate carcinoma , 2001, Cancer.

[42]  Heinz-Peter Schlemmer,et al.  MRI-guided biopsy of the prostate increases diagnostic performance in men with elevated or increasing PSA levels after previous negative TRUS biopsies. , 2006, European urology.

[43]  G N Collins,et al.  Relationship between prostate specific antigen, prostate volume and age in the benign prostate. , 1993, British journal of urology.

[44]  A W Partin,et al.  Validation of Partin tables for predicting pathological stage of clinically localized prostate cancer. , 2000, The Journal of urology.

[45]  Sung Il Hwang,et al.  Pre-Operative Prediction of Advanced Prostatic Cancer Using Clinical Decision Support Systems: Accuracy Comparison between Support Vector Machine and Artificial Neural Network , 2011, Korean journal of radiology.

[46]  L. Sobin,et al.  TNM Classification of Malignant Tumours , 1987, UICC International Union Against Cancer.

[47]  Choung-Soo Kim,et al.  Nomograms for the Prediction of Pathologic Stage of Clinically Localized Prostate Cancer in Korean Men , 2005, Journal of Korean medical science.

[48]  Roger S. Kirby The Prostate: Small Gland, Big Problem , 2000 .

[49]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[50]  Stefano Cavalleri,et al.  Mathematical Models for Prognostic Prediction in Patients with Renal Cell Carcinoma , 2008, Urologia Internationalis.

[51]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[52]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[53]  Ahmet Soylu,et al.  Validation of 2001 Partin tables in Turkey: a multicenter study. , 2005, European urology.

[54]  A W Partin,et al.  Contemporary update of prostate cancer staging nomograms (Partin Tables) for the new millennium. , 2002, Urology.

[55]  Mesut Remzi,et al.  Artificial neural networks for decision-making in urologic oncology. , 2003, European urology.

[56]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[57]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[58]  A. Haese*,et al.  Partin Tables cannot accurately predict the pathological stage at radical prostatectomy. , 2009, European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology.

[59]  Tom Dehn,et al.  Open Versus Laparoscopic Radical Prostatectomy , 2007 .

[60]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[61]  J. García-Segura,et al.  Comparison of digital rectal examination, transrectal ultrasonography, and multicoil magnetic resonance imaging for preoperative evaluation of prostate cancer. , 1997, European urology.

[62]  Robert W Veltri,et al.  Comparison of logistic regression and neural net modeling for prediction of prostate cancer pathologic stage. , 2002, Clinical chemistry.