CCA based multi-view feature selection for multi-omics data integration

Recent technological advances in high-throughput omics technologies and their applications in genomic medicine have opened up outstanding opportunities for individualized medicine. However, several challenges arise in the integrative analysis of such data including heterogeneity and high dimensionality of the omics data. In this study, we present a novel multi-view feature selection algorithm based on the well- known canonical correlation analysis (CCA) statistical method for jointly selecting discriminative features from multi-omics data sources (multi-views). Our results demonstrate that models for predicting kidney renal clear cell carcinoma (KIRC) survival using our proposed method for jointly selecting discriminative features from copy number alteration (CNA), gene expression RNA-Seq, and reverse-phase protein arrays (RPPA) views outperform models trained using single-view data as well as three integrated models developed using data fusion approaches including CCA-based feature fusion.

[1]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[2]  Shiliang Sun,et al.  Multi-view Laplacian Support Vector Machines , 2011, ADMA.

[3]  Jack L. Gallant,et al.  Pyrcca: Regularized Kernel Canonical Correlation Analysis in Python and Its Applications to Neuroimaging , 2015, Front. Neuroinform..

[4]  Peilin Jia,et al.  Unique protein expression signatures of survival time in kidney renal clear cell carcinoma through a pan-cancer screening , 2017, BMC Genomics.

[5]  M. Gevrey,et al.  Review and comparison of methods to study the contribution of variables in artificial neural network models , 2003 .

[6]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[7]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.

[8]  Guna Seetharaman,et al.  Multiview Boosting With Information Propagation for Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[10]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[11]  Jessica D. Tenenbaum,et al.  Translational Bioinformatics: Past, Present, and Future , 2016, Genom. Proteom. Bioinform..

[12]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[13]  Long Lan,et al.  Sparse Representation Based Discriminative Canonical Correlation Analysis for Face Recognition , 2012, 2012 11th International Conference on Machine Learning and Applications.

[14]  Shiliang Sun,et al.  Multi-view Laplacian twin support vector machines , 2014, Applied Intelligence.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  F. Marincola,et al.  Tumour immunity: effector response to tumour and role of the microenvironment , 2008, The Lancet.

[17]  J. Kettenring,et al.  Canonical Analysis of Several Sets of Variables , 2022 .

[18]  Daoqiang Zhang,et al.  Multi-view dimensionality reduction via canonical random correlation analysis , 2015, Frontiers of Computer Science.

[19]  Qi Zheng,et al.  GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis , 2008, Nucleic Acids Res..

[20]  Shiliang Sun,et al.  Multi-view learning overview: Recent progress and new challenges , 2017, Inf. Fusion.

[21]  Ognjen Arandjelovic Discriminative extended canonical correlation analysis for pattern set matching , 2013, Machine Learning.

[22]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[23]  Marylyn D. Ritchie,et al.  Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer , 2015, J. Biomed. Informatics.

[24]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[25]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Jian Yang,et al.  Feature fusion: parallel strategy vs. serial strategy , 2003, Pattern Recognit..

[28]  Mary Goldman,et al.  The UCSC Cancer Genomics Browser: update 2015 , 2014, Nucleic Acids Res..

[29]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[30]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[31]  Yinghuan Shi,et al.  MRM-Lasso: A Sparse Multiview Feature Selection Method via Low-Rank Analysis , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Yan Liu,et al.  A new method of feature fusion and its application in image recognition , 2005, Pattern Recognit..

[33]  Bo Du,et al.  Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding , 2015, Pattern Recognit..

[34]  Korris Fu-Lai Chung,et al.  Multi-view L2-SVM and its multi-view core vector machine , 2016, Neural Networks.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[36]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[37]  H. Rehrauer,et al.  Prognostic value of cross-omics screening for kidney clear cell renal cancer survival , 2016, Biology Direct.