Supervised Regularized Canonical Correlation Analysis: integrating histologic and proteomic measurements for predicting biochemical recurrence following prostate surgery

BackgroundMultimodal data, especially imaging and non-imaging data, is being routinely acquired in the context of disease diagnostics; however, computational challenges have limited the ability to quantitatively integrate imaging and non-imaging data channels with different dimensionalities and scales. To the best of our knowledge relatively few attempts have been made to quantitatively fuse such data to construct classifiers and none have attempted to quantitatively combine histology (imaging) and proteomic (non-imaging) measurements for making diagnostic and prognostic predictions. The objective of this work is to create a common subspace to simultaneously accommodate both the imaging and non-imaging data (and hence data corresponding to different scales and dimensionalities), called a metaspace. This metaspace can be used to build a meta-classifier that produces better classification results than a classifier that is based on a single modality alone. Canonical Correlation Analysis (CCA) and Regularized CCA (RCCA) are statistical techniques that extract correlations between two modes of data to construct a homogeneous, uniform representation of heterogeneous data channels. In this paper, we present a novel modification to CCA and RCCA, Supervised Regularized Canonical Correlation Analysis (SRCCA), that (1) enables the quantitative integration of data from multiple modalities using a feature selection scheme, (2) is regularized, and (3) is computationally cheap. We leverage this SRCCA framework towards the fusion of proteomic and histologic image signatures for identifying prostate cancer patients at the risk of 5 year biochemical recurrence following radical prostatectomy.ResultsA cohort of 19 grade, stage matched prostate cancer patients, all of whom had radical prostatectomy, including 10 of whom had biochemical recurrence within 5 years of surgery and 9 of whom did not, were considered in this study. The aim was to construct a lower fused dimensional metaspace comprising both the histological and proteomic measurements obtained from the site of the dominant nodule on the surgical specimen. In conjunction with SRCCA, a random forest classifier was able to identify prostate cancer patients, who developed biochemical recurrence within 5 years, with a maximum classification accuracy of 93%.ConclusionsThe classifier performance in the SRCCA space was found to be statistically significantly higher compared to the fused data representations obtained, not only from CCA and RCCA, but also two other statistical techniques called Principal Component Analysis and Partial Least Squares Regression. These results suggest that SRCCA is a computationally efficient and a highly accurate scheme for representing multimodal (histologic and proteomic) data in a metaspace and that it could be used to construct fused biomarkers for predicting disease recurrence and prognosis.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  M. Barry,et al.  Detection of prostate cancer via biopsy in the Medicare-SEER population during the PSA era. , 2007, Journal of the National Cancer Institute.

[4]  Vilppu J Tuominen,et al.  Histopathological variables and biomarkers enhancer of zeste homologue 2, Ki‐67 and minichromosome maintenance protein 7 as prognosticators in primarily endocrine‐treated prostate cancer , 2011, BJU international.

[5]  M. Mann,et al.  Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips , 2007, Nature Protocols.

[6]  A W Partin,et al.  Natural history of progression after PSA elevation following radical prostatectomy. , 1999, JAMA.

[7]  P. Sved,et al.  Limitations of biopsy Gleason grade: implications for counseling patients with biopsy Gleason score 6 prostate cancer. , 2004, The Journal of urology.

[8]  Anant Madabhushi,et al.  Cascaded multi-class pairwise classifier (CascaMPa) for normal, cancerous, and cancer confounder classes in prostate histology , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[9]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[10]  D W Chan,et al.  Prostate-specific antigen: update 1997. , 1997, Journal of the International Federation of Clinical Chemistry.

[11]  D. Bostwick,et al.  Interobserver reproducibility of Gleason grading of prostatic carcinoma: urologic pathologists. , 2001, Human pathology.

[12]  Purang Abolmaesumi,et al.  High-throughput detection of prostate cancer in histological sections using probabilistic pairwise Markov models , 2010, Medical Image Anal..

[13]  Gyan Bhanot,et al.  Computerized Image-Based Detection and Grading of Lymphocytic Infiltration in HER2+ Breast Cancer Histopathology , 2010, IEEE Transactions on Biomedical Engineering.

[14]  Ferran Algaba,et al.  Gleason grading of prostate cancer in needle biopsies or radical prostatectomy specimens: contemporary approach, current clinical significance and sources of pathology discrepancies , 2005, BJU international.

[15]  Anant Madabhushi,et al.  Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[16]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[17]  Darren R Tyson,et al.  Proteomics for the identification of new prostate cancer biomarkers. , 2006, Urologic oncology.

[18]  Graeme P. Penney,et al.  Estimating and resolving uncertainty in cardiac respiratory motion modelling , 2012, 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI).

[19]  B. Silverman,et al.  Canonical correlation analysis when the data are curves. , 1993 .

[20]  Emanuel F Petricoin,et al.  Application of proteomic technologies for prostate cancer detection, prognosis, and tailored therapy , 2010, Critical reviews in clinical laboratory sciences.

[21]  P. Townsend,et al.  Discovery of serum protein biomarkers for prostate cancer progression by proteomic analysis. , 2010, Cancer genomics & proteomics.

[22]  Jieping Ye,et al.  A least squares formulation for canonical correlation analysis , 2008, ICML '08.

[23]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[24]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[25]  D W Hillman,et al.  Radiotherapy for isolated serum prostate specific antigen elevation after prostatectomy for prostate cancer. , 2000, The Journal of urology.

[26]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[27]  Paul M. B. Vitányi,et al.  Proceedings of the Second European Conference on Computational Learning Theory , 1995 .

[28]  George Lee,et al.  Computer-aided prognosis: Predicting patient and disease outcome via quantitative fusion of multi-scale, multi-modal data , 2011, Comput. Medical Imaging Graph..

[29]  Francisco Azuaje,et al.  An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors , 2006, BMC Medical Informatics Decis. Mak..

[30]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[31]  Anant Madabhushi,et al.  Novel Morphometric Based Classification via Diffeomorphic Based Shape Representation Using Manifold Learning , 2010, MICCAI.

[32]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[33]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[34]  H. Vinod Canonical ridge and econometrics of joint production , 1976 .

[35]  George Stephanopoulos,et al.  Determination of minimum sample size and discriminatory expression patterns in microarray data , 2002, Bioinform..

[36]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[37]  A W Partin,et al.  The clinical usefulness of percent free-PSA. , 1996, Urology.

[38]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Michael M Lieber,et al.  Extended and saturation needle biopsy for the diagnosis of prostate cancer , 2004, Current urology reports.

[41]  Anant Madabhushi,et al.  AUTOMATED GRADING OF PROSTATE CANCER USING ARCHITECTURAL AND TEXTURAL IMAGE FEATURES , 2007, 2007 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[42]  Gleason Df Classification of prostatic carcinomas. , 1966 .

[43]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[44]  Anant Madabhushi,et al.  Semi Supervised Multi Kernel (SeSMiK) Graph Embedding: Identifying Aggressive Prostate Cancer via Magnetic Resonance Imaging and Spectroscopy , 2010, MICCAI.

[45]  Hans Knutsson,et al.  Blind Source Separation of Functional MRI Data , 2002 .

[46]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[47]  E. Bergstralh,et al.  PSA doubling time as a predictor of clinical progression after biochemical failure following radical prostatectomy for prostate cancer. , 2001, Mayo Clinic proceedings.

[48]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[49]  Stephen R. Master,et al.  Peptide extraction from formalin-fixed paraffin-embedded tissue. , 2011, Current protocols in protein science.

[50]  BMC Bioinformatics , 2005 .

[51]  T. Veenstra Global and targeted quantitative proteomics for biomarker discovery. , 2007, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[52]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[53]  John G. Albeck,et al.  Cue-Signal-Response Analysis of TNF-Induced Apoptosis by Partial Least Squares Regression of Dynamic Multivariate Data , 2004, J. Comput. Biol..

[54]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[55]  C. King,et al.  Patterns of prostate cancer biopsy grading: Trends and clinical implications , 2000, International journal of cancer.

[56]  Alain Baccini,et al.  CCA: An R Package to Extend Canonical Correlation Analysis , 2008 .

[57]  Brian L Hood,et al.  Biomarkers: Mining the Biofluid Proteome* , 2005, Molecular & Cellular Proteomics.

[58]  Alan W Partin,et al.  Prostate-specific antigen: update 2006. , 2006, Urology.

[59]  Josef Kittler,et al.  A Comparative Study of Hough Transform Methods for Circle Finding , 1989, Alvey Vision Conference.

[60]  B. Moor,et al.  On the Regularization of Canonical Correlation Analysis , 2003 .

[61]  Anant Madabhushi,et al.  A consensus embedding approach for segmentation of high resolution in vivo prostate magnetic resonance imagery , 2008, SPIE Medical Imaging.

[62]  C. Eyers Universal sample preparation method for proteome analysis , 2009 .

[63]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[64]  Antoine Geissbühler,et al.  A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[65]  H. Knutsson,et al.  A Unified Approach to PCA, PLS, MLR and CCA , 1997 .

[66]  J. D. Stowe,et al.  A Canonical Correlation Analysis of Commercial Bank Asset/Liability Structures , 1983, Journal of Financial and Quantitative Analysis.

[67]  George Lee,et al.  A knowledge representation framework for integration, classification of multi-scale imaging and non-imaging data: Preliminary results in predicting prostate cancer recurrence by fusing mass spectrometry and histology , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[68]  George Lee,et al.  Multi-modal data fusion schemes for integrated classification of imaging and non-imaging biomedical data , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[69]  Mikhail Teverovskiy,et al.  Multifeature Prostate Cancer Diagnosis and Gleason Grading of Histological Images , 2007, IEEE Transactions on Medical Imaging.

[70]  Purang Abolmaesumi,et al.  Detection of Prostate Cancer from Whole-Mount Histology Images Using Markov Random Fields , 2008 .

[71]  P Tiwari,et al.  Multimodal wavelet embedding representation for data combination (MaWERiC): integrating magnetic resonance imaging and spectroscopy for prostate cancer detection , 2012, NMR in biomedicine.

[72]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[73]  M. L. Eaton,et al.  The Non-Singularity of Generalized Sample Covariance Matrices , 1973 .

[74]  Dean P. Foster Multi-View Dimensionality Reduction via Canonical Correlation Multi-View Dimensionality Reduction via Canonical Correlation Analysis Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimensionality Reduction via Canonical Correlation Analysis Multi-View Dimen , 2008 .

[75]  Sayan Mukherjee,et al.  Estimating Dataset Size Requirements for Classifying DNA Microarray Data , 2003, J. Comput. Biol..

[76]  Andrew J Vickers,et al.  Prostate cancer-specific mortality after radical prostatectomy for patients treated in the prostate-specific antigen era. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[77]  H. Abdi Partial Least Squares (PLS) Regression. , 2003 .

[78]  Colin Fyfe,et al.  A canonical correlation neural network for multicollinearity and functional data , 2004, Neural Networks.

[79]  D. Gleason Classification of prostatic carcinomas. , 1966, Cancer chemotherapy reports.

[80]  Anant Madabhushi,et al.  Predicting classifier performance with a small training set: Applications to computer-aided diagnosis and prognosis , 2010, 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.