Supervised Multi-View Canonical Correlation Analysis (sMVCCA): Integrating Histologic and Proteomic Features for Predicting Recurrent Prostate Cancer

In this work, we present a new methodology to facilitate prediction of recurrent prostate cancer (CaP) following radical prostatectomy (RP) via the integration of quantitative image features and protein expression in the excised prostate. Creating a fused predictor from high-dimensional data streams is challenging because the classifier must 1) account for the “curse of dimensionality” problem, which hinders classifier performance when the number of features exceeds the number of patient studies and 2) balance potential mismatches in the number of features across different channels to avoid classifier bias towards channels with more features. Our new data integration methodology, supervised Multi-view Canonical Correlation Analysis (sMVCCA), aims to integrate infinite views of highdimensional data to provide more amenable data representations for disease classification. Additionally, we demonstrate sMVCCA using Spearman's rank correlation which, unlike Pearson's correlation, can account for nonlinear correlations and outliers. Forty CaP patients with pathological Gleason scores 6-8 were considered for this study. 21 of these men revealed biochemical recurrence (BCR) following RP, while 19 did not. For each patient, 189 quantitative histomorphometric attributes and 650 protein expression levels were extracted from the primary tumor nodule. The fused histomorphometric/proteomic representation via sMVCCA combined with a random forest classifier predicted BCR with a mean AUC of 0.74 and a maximum AUC of 0.9286. We found sMVCCA to perform statistically significantly (p <; 0.05) better than comparative state-of-the-art data fusion strategies for predicting BCR. Furthermore, Kaplan-Meier analysis demonstrated improved BCR-free survival prediction for the sMVCCA-fused classifier as compared to histology or proteomic features alone.

[1]  Sira Sriswasdi,et al.  Systematic discovery of ectopic pregnancy serum biomarkers using 3-D protein profiling coupled with label-free quantitation. , 2011, Journal of proteome research.

[2]  Anant Madabhushi,et al.  Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data , 2012, BMC Bioinformatics.

[3]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[4]  Robert A. Gardiner,et al.  Markers for Detection of Prostate Cancer , 2010, Cancers.

[5]  J. Hugosson,et al.  Ki-67 in screen-detected, low-grade, low-stage prostate cancer, relation to prostate-specific antigen doubling time, Gleason score and prostate-specific antigen relapse after radical prostatectomy , 2009, Scandinavian journal of urology and nephrology.

[6]  J. Epstein,et al.  Interobserver reproducibility of Gleason grading of prostatic carcinoma: general pathologist. , 2001, Human pathology.

[7]  P. Townsend,et al.  Discovery of serum protein biomarkers for prostate cancer progression by proteomic analysis. , 2010, Cancer genomics & proteomics.

[8]  Jieping Ye,et al.  A least squares formulation for canonical correlation analysis , 2008, ICML '08.

[9]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[10]  Hamid Soltanian-Zadeh,et al.  Multiwavelet grading of pathological images of prostate , 2003, IEEE Transactions on Biomedical Engineering.

[11]  Brian L Hood,et al.  Biomarkers: Mining the Biofluid Proteome* , 2005, Molecular & Cellular Proteomics.

[12]  Jürgen Cox,et al.  Super-SILAC Allows Classification of Diffuse Large B-cell Lymphoma Subtypes by Their Protein Expression Profiles* , 2012, Molecular & Cellular Proteomics.

[13]  T. Rebbeck,et al.  Co-Occurring Gland Angularity in Localized Subgraphs: Predicting Biochemical Recurrence in Intermediate-Risk Prostate Cancer Patients , 2014, PloS one.

[14]  Robert W Veltri,et al.  Nuclear roundness variance predicts prostate cancer progression, metastasis, and death: A prospective evaluation with up to 25 years of follow‐up after radical prostatectomy , 2010, The Prostate.

[15]  Aissar Eduardo Nassif,et al.  Immunohistochemistry expression of tumor markers CD34 and P27 as a prognostic factor of clinically localized prostate adenocarcinoma after radical prostatectomy. , 2010, Revista do Colegio Brasileiro de Cirurgioes.

[16]  M. Mann,et al.  Universal sample preparation method for proteome analysis , 2009, Nature Methods.

[17]  Lori J Sokoll,et al.  Predicting prostate cancer biochemical recurrence using a panel of serum proteomic biomarkers. , 2009, The Journal of urology.

[18]  M. Bartlett Further aspects of the theory of multiple regression , 1938, Mathematical Proceedings of the Cambridge Philosophical Society.

[19]  Vilppu J Tuominen,et al.  Histopathological variables and biomarkers enhancer of zeste homologue 2, Ki‐67 and minichromosome maintenance protein 7 as prognosticators in primarily endocrine‐treated prostate cancer , 2011, BJU international.

[20]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[21]  George Lee,et al.  Variable Importance in Nonlinear Kernels (VINK): Classification of Digitized Histopathology , 2013, MICCAI.

[22]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[23]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[24]  B. Silverman,et al.  Canonical correlation analysis when the data are curves. , 1993 .

[25]  Purang Abolmaesumi,et al.  High-throughput detection of prostate cancer in histological sections using probabilistic pairwise Markov models , 2010, Medical Image Anal..

[26]  A. Madabhushi,et al.  Integrated diagnostics: a conceptual framework with examples , 2010, Clinical chemistry and laboratory medicine.

[27]  B. Yener,et al.  Cell-Graph Mining for Breast Tissue Modeling and Classification , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[29]  Zhenju Song,et al.  Selection of disease-specific biomarkers by integrating inflammatory mediators with clinical informatics in AECOPD patients: a preliminary study , 2012, Journal of cellular and molecular medicine.

[30]  J. D. de Lemos,et al.  Biomarkers in cardiovascular disease: integrating pathophysiology into clinical practice. , 2006, American Journal of the Medical Sciences.

[31]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[32]  Stephen R. Master,et al.  Peptide extraction from formalin-fixed paraffin-embedded tissue. , 2011, Current protocols in protein science.

[33]  Christophe Croux,et al.  The Gaussian rank correlation estimator: robustness properties , 2010, Statistics and Computing.

[34]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[35]  Anant Madabhushi,et al.  Class-specific weighting for Markov random field estimation: Application to medical image segmentation , 2012, Medical Image Anal..

[36]  Kluwer Academic Publishers The international journal of cardiovascular imaging , 2001 .

[37]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[38]  H. Vinod Canonical ridge and econometrics of joint production , 1976 .

[39]  Anant Madabhushi,et al.  Semi Supervised Multi Kernel (SeSMiK) Graph Embedding: Identifying Aggressive Prostate Cancer via Magnetic Resonance Imaging and Spectroscopy , 2010, MICCAI.

[40]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[41]  Richard Bellman,et al.  Adaptive Control Processes - A Guided Tour (Reprint from 1961) , 2015, Princeton Legacy Library.

[42]  George Lee,et al.  Multi-modal data fusion schemes for integrated classification of imaging and non-imaging biomedical data , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[43]  Anant Madabhushi,et al.  Cascaded discrimination of normal, abnormal, and confounder classes in histopathology: Gleason grading of prostate cancer , 2012, BMC Bioinformatics.

[44]  Ruedi Aebersold,et al.  Mass Spectrometry-based Expression Profiling of Clinical Prostate Cancer , 2005, Molecular & Cellular Proteomics.

[45]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Misop Han,et al.  Prostate cancer-specific survival following salvage radiotherapy vs observation in men with biochemical recurrence after radical prostatectomy. , 2008, JAMA.

[47]  George Lee,et al.  A knowledge representation framework for integration, classification of multi-scale imaging and non-imaging data: Preliminary results in predicting prostate cancer recurrence by fusing mass spectrometry and histology , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[48]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[49]  Roy D. Yates,et al.  Probability and stochastic processes , 1998 .

[50]  Chong-sun Kim Canonical Analysis of Several Sets of Variables , 1973 .

[51]  Mikhail Teverovskiy,et al.  Multifeature Prostate Cancer Diagnosis and Gleason Grading of Histological Images , 2007, IEEE Transactions on Medical Imaging.

[52]  George Lee,et al.  Supervised Regularized Canonical Correlation Analysis: integrating histologic and proteomic measurements for predicting biochemical recurrence following prostate surgery , 2011, BMC Bioinformatics.

[53]  Purang Abolmaesumi,et al.  Detection of Prostate Cancer from Whole-Mount Histology Images Using Markov Random Fields , 2008 .

[54]  Torsten Rohlfing,et al.  Information Fusion in Biomedical Image Analysis: Combination of Data vs. Combination of Interpretations , 2005, IPMI.

[55]  Anant Madabhushi,et al.  Novel Morphometric Based Classification via Diffeomorphic Based Shape Representation Using Manifold Learning , 2010, MICCAI.

[56]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[57]  Anant Madabhushi,et al.  Automated gland and nuclei segmentation for grading of prostate and breast cancer histopathology , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[58]  George Lee,et al.  Computer-aided prognosis: Predicting patient and disease outcome via quantitative fusion of multi-scale, multi-modal data , 2011, Comput. Medical Imaging Graph..

[59]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[60]  Frits Mastik,et al.  Rationale and methods of the integrated biomarker and imaging study (IBIS): combining invasive and non-invasive imaging with biomarkers to detect subclinical atherosclerosis and assess coronary lesion biology , 2005, The International Journal of Cardiovascular Imaging.

[61]  Anant Madabhushi,et al.  Cell cluster graph for prediction of biochemical recurrence in prostate cancer patients from tissue microarrays , 2013, Medical Imaging.

[62]  Songcan Chen,et al.  Class label versus sample label-based CCA , 2007, Appl. Math. Comput..

[63]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[64]  Gert R. G. Lanckriet,et al.  Contextual Object Localization With Multiple Kernel Nearest Neighbor , 2011, IEEE Transactions on Image Processing.

[65]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[66]  M. Mann,et al.  Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips , 2007, Nature Protocols.

[67]  Johanna S. Hardin,et al.  A robust measure of correlation between two genes on a microarray , 2007, BMC Bioinformatics.

[68]  N. Mantel Evaluation of survival data and two new rank order statistics arising in its consideration. , 1966, Cancer chemotherapy reports.

[69]  J. Shawe-Taylor,et al.  Multi-View Canonical Correlation Analysis , 2010 .

[70]  Jonathan H Sunshine,et al.  Comparing the costs of radiation therapy and radical prostatectomy for the initial treatment of early-stage prostate cancer. , 2002, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[71]  George Lee,et al.  Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[72]  P Tiwari,et al.  Multimodal wavelet embedding representation for data combination (MaWERiC): integrating magnetic resonance imaging and spectroscopy for prostate cancer detection , 2012, NMR in biomedicine.

[73]  William A. Christens-Barry,et al.  Quantitative Grading of Tissue and Nuclei in Prostate Cancer for Prognosis Prediction , 1997 .

[74]  Fumio Nomura,et al.  Developments for a growing Japanese patient population: facilitating new technologies for future health care. , 2011, Journal of proteomics.

[75]  R Spang,et al.  Molecular Diagnosis , 2005, Methods of Information in Medicine.

[76]  George Lee,et al.  Co-occurring gland tensors in localized cluster graphs: Quantitative histomorphometry for predicting biochemical recurrence for intermediate grade prostate cancer , 2013, 2013 IEEE 10th International Symposium on Biomedical Imaging.

[77]  J. Epstein An update of the Gleason grading system. , 2010, The Journal of urology.

[78]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Alain Baccini,et al.  CCA: An R Package to Extend Canonical Correlation Analysis , 2008 .