Multi-modality fusion using canonical correlation analysis methods: Application in breast cancer survival prediction from histology and genomics

The availability of multi-modality datasets provides a unique opportunity to characterize the same object of interest using multiple viewpoints more comprehensively. In this work, we investigate the use of canonical correlation analysis (CCA) and penalized variants of CCA (pCCA) for the fusion of two modalities. We study a simple graphical model for the generation of two-modality data. We analytically show that, with known model parameters, posterior mean estimators that jointly use both modalities outperform arbitrary linear mixing of single modality posterior estimators in latent variable prediction. Penalized extensions of CCA (pCCA) that incorporate domain knowledge can discover correlations with high-dimensional, low-sample data, whereas traditional CCA is inapplicable. To facilitate the generation of multi-dimensional embeddings with pCCA, we propose two matrix deflation schemes that enforce desirable properties exhibited by CCA. We propose a two-stage prediction pipeline using pCCA embeddings generated with deflation for latent variable prediction by combining all the above. On simulated data, our proposed model drastically reduces the mean-squared error in latent variable prediction. When applied to publicly available histopathology data and RNA-sequencing data from The Cancer Genome Atlas (TCGA) breast cancer patients, our model can outperform principal components analysis (PCA) embeddings of the same dimension in survival prediction.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  Martin C. Stumpe,et al.  Deep Orthogonal Fusion: Multimodal Prognostic Biomarker Discovery Integrating Radiology, Pathology, Genomic, and Clinical Data , 2021, MICCAI.

[3]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[4]  Xi Chen,et al.  An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping , 2011, Statistics in Biosciences.

[5]  Tanveer F. Syeda-Mahmood,et al.  Modeling Uncertainty in Multi-Modal Fusion for Lung Cancer Survival Analysis , 2021, 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).

[6]  J. Kettenring,et al.  Canonical Analysis of Several Sets of Variables , 2022 .

[7]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[8]  Anne E Carpenter,et al.  CellProfiler: image analysis software for identifying and quantifying cell phenotypes , 2006, Genome Biology.

[9]  Robert Tibshirani,et al.  Collaborative regression. , 2014, Biostatistics.

[10]  Dongdong Sun,et al.  Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome , 2018, Comput. Methods Programs Biomed..

[11]  Jianliang Gao,et al.  Predicting the Survival of Cancer Patients With Multimodal Graph Neural Network , 2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[13]  Joel H. Saltz,et al.  Robust Histopathology Image Analysis: To Label or to Synthesize? , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Thomas Brox,et al.  CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[16]  Qisong Wu,et al.  Classification of Breast Cancer Histology Images Using Multi-Size and Discriminative Patches Based on Deep Learning , 2019, IEEE Access.

[17]  Shengping Zhang,et al.  Modality-correlation-aware sparse representation for RGB-infrared object tracking , 2020, Pattern Recognit. Lett..

[18]  Olivier Gevaert,et al.  Deep learning with multimodal representation for pancancer prognosis prediction , 2019, bioRxiv.

[19]  D. Brat,et al.  Predicting cancer outcomes from histology and genomics using convolutional networks , 2017, Proceedings of the National Academy of Sciences.

[20]  Chen Sun,et al.  Multi-modal Transformer for Video Retrieval , 2020, ECCV.

[21]  Minh N. Do,et al.  Multimodal Fusion of Imaging and Genomics for Lung Cancer Recurrence Prediction , 2020, 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI).

[22]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[23]  Mohamed Abdel-Mottaleb,et al.  Fully automatic face normalization and single sample face recognition in unconstrained environments , 2016, Expert Syst. Appl..

[24]  Shannon L. Risacher,et al.  GN-SCCA: GraphNet Based Sparse Canonical Correlation Analysis for Brain Imaging Genetics , 2015, BIH.

[25]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[26]  N. Dubrawsky Cancer statistics , 1989, CA: a cancer journal for clinicians.

[27]  Su Ruan,et al.  Brain tumor segmentation with missing modalities via latent multi-source correlation representation , 2020, MICCAI.

[28]  Jian Ma,et al.  Correlating cellular features with gene expression using CCA , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[29]  Yan Liu,et al.  A new method of feature fusion and its application in image recognition , 2005, Pattern Recognit..

[30]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[31]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[32]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[33]  Minh N. Do,et al.  Multimodal Fusion Using Sparse Cca For Breast Cancer Survival Prediction , 2021, 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).

[34]  R. Deb,et al.  Multidisciplinary team approach in breast cancer care: Benefits and challenges , 2020, Indian journal of pathology & microbiology.

[35]  Karen F. Berman,et al.  G-MIND: an end-to-end multimodal imaging-genetics framework for biomarker identification and disease classification , 2021, Medical Imaging.

[36]  Yi Qi,et al.  Imaging genomics for accurate diagnosis and treatment of tumors: A cutting edge overview. , 2020, Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie.

[37]  Alima Damak Masmoudi,et al.  An automatic Computer-Aided Diagnosis system based on the Multimodal fusion of Breast Cancer (MF-CAD) , 2021, Biomed. Signal Process. Control..

[38]  Lester W. Mackey,et al.  Deflation Methods for Sparse PCA , 2008, NIPS.

[39]  Jian Ma,et al.  Integration of Spatial Distribution in Imaging-Genetics , 2018, MICCAI.

[40]  A. Jemal,et al.  Cancer Statistics, 2021 , 2021, CA: a cancer journal for clinicians.

[41]  Vince D. Calhoun,et al.  A review of multivariate analyses in imaging genetics , 2014, Front. Neuroinform..

[42]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[43]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[44]  Ming Y. Lu,et al.  Pathomic Fusion: An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis , 2019, IEEE Transactions on Medical Imaging.

[45]  George Lee,et al.  Supervised Regularized Canonical Correlation Analysis: integrating histologic and proteomic measurements for predicting biochemical recurrence following prostate surgery , 2011, BMC Bioinformatics.

[46]  W. Coleman,et al.  Molecular and cellular heterogeneity in breast cancer: challenges for personalized medicine. , 2013, The American journal of pathology.

[47]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .