Sparse Canonical Correlation Analysis via Concave Minimization

A new approach to the sparse Canonical Correlation Analysis (sCCA)is proposed with the aim of discovering interpretable associations in very high-dimensional multi-view, i.e.observations of multiple sets of variables on the same subjects, problems. Inspired by the sparse PCA approach of Journee et al. (2010), we also show that the sparse CCA formulation, while non-convex, is equivalent to a maximization program of a convex objective over a compact set for which we propose a first-order gradient method. This result helps us reduce the search space drastically to the boundaries of the set. Consequently, we propose a two-step algorithm, where we first infer the sparsity pattern of the canonical directions using our fast algorithm, then we shrink each view, i.e. observations of a set of covariates, to contain observations on the sets of covariates selected in the previous step, and compute their canonical directions via any CCA algorithm. We also introduceDirected Sparse CCA, which is able to find associations which are aligned with a specified experiment design, andMulti-View sCCA which is used to discover associations between multiple sets of covariates. Our simulations establish the superior convergence properties and computational efficiency of our algorithm as well as accuracy in terms of the canonical correlation and its ability to recover the supports of the canonical directions. We study the associations between metabolomics, trasncriptomics and microbiomics in a multi-omic study usingMuLe, which is an R-package that implements our approach, in order to form hypotheses on mechanisms of adaptations of Drosophila Melanogaster to high doses of environmental toxicants, specifically Atrazine, which is a commonly used chemical fertilizer.

[1]  J. Barnett,et al.  Long-term Immunotoxic Effects of Oral Prenatal and Neonatal Atrazine Exposure , 2019, Toxicological sciences : an official journal of the Society of Toxicology.

[2]  John K Colbourne,et al.  How omics technologies can enhance chemical safety regulation: perspectives from academia, government, and industry , 2018, Environmental toxicology and chemistry.

[3]  James B. Brown,et al.  Early transcriptional response pathways in Daphnia magna are coordinated in networks of crustacean‐specific genes , 2018, Molecular ecology.

[4]  Young-Mo Kim,et al.  Influence of early life exposure, host genetics and diet on the mouse gut microbiome and metabolome , 2016, Nature Microbiology.

[5]  Honglak Lee,et al.  Deep Variational Canonical Correlation Analysis , 2016, ArXiv.

[6]  Vince D. Calhoun,et al.  Joint sparse canonical correlation analysis for detecting differential imaging genetics modules , 2016, Bioinform..

[7]  V. Forbes THE PERSPECTIVES COLUMN IS A REGULAR SERIES DESIGNED TO DISCUSS AND EVALUATE POTENTIALLY COMPETING VIEWPOINTS AND RESEARCH FINDINGS ON CURRENT ENVIRONMENTAL ISSUES. , 2016 .

[8]  Yu-Te Wang,et al.  A Comparison Study of Canonical Correlation Analysis Based Methods for Detecting Steady-State Visual Evoked Potentials , 2015, PloS one.

[9]  Lei Cao,et al.  Sequence detection analysis based on canonical correlation for steady-state visual evoked potential brain computer interfaces , 2015, Journal of Neuroscience Methods.

[10]  Chiranjib Chakraborty,et al.  DNA pattern recognition using canonical correlation algorithm , 2015, Journal of Biosciences.

[11]  Matti Pirinen,et al.  metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis , 2015, bioRxiv.

[12]  W. S. Baldwin,et al.  The HR96 activator, atrazine, reduces sensitivity of D. magna to triclosan and DHA. , 2015, Chemosphere.

[13]  Serdar Bozdag,et al.  A Canonical Correlation Analysis-Based Dynamic Bayesian Network Prior to Infer Gene Regulatory Networks from Multiple Types of Biological Data , 2015, J. Comput. Biol..

[14]  James B. Brown,et al.  Diversity and dynamics of the Drosophila transcriptome , 2014, Nature.

[15]  Nicholas B Larson,et al.  Kernel canonical correlation analysis for assessing gene–gene interactions and application to ovarian cancer , 2013, European Journal of Human Genetics.

[16]  P. Bickel,et al.  Diversity and dynamics of the Drosophila , 2014 .

[17]  Quan-Sen Sun,et al.  Orthogonal canonical correlation analysis and its application in feature fusion , 2013, Proceedings of the 16th International Conference on Information Fusion.

[18]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[19]  Yasunori Fujikoshi,et al.  A Variable Selection Criterion for Two Sets of Principal Component Scores in Principal Canonical Correlation Analysis , 2013 .

[20]  Juho Rousu,et al.  Biomarker Discovery by Sparse Canonical Correlation Analysis of Complex Clinical Phenotypes of Tuberculosis and Malaria , 2013, PLoS Comput. Biol..

[21]  Jia Cai,et al.  The distance between feature subspaces of kernel canonical correlation analysis , 2013, Math. Comput. Model..

[22]  Baoxue Zhang,et al.  Dimension reduction in functional regression using mixed data canonical correlation analysis , 2013 .

[23]  Jonathan A. Eisen,et al.  Bacterial Communities of Diverse Drosophila Species: Ecological Context of a Host–Microbe Model System , 2011, PLoS genetics.

[24]  Samuel Kaski,et al.  Bayesian exponential family projections for coupled data sources , 2010, UAI.

[25]  John Shawe-Taylor,et al.  Sparse canonical correlation analysis , 2009, Machine Learning.

[26]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[27]  C. Thummel,et al.  The DHR96 nuclear receptor controls triacylglycerol homeostasis in Drosophila. , 2009, Cell metabolism.

[28]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[29]  D. Tritchler,et al.  Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[30]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[31]  K. Fukumizu,et al.  Sensitivity analysis in robust and kernel canonical correlation analysis , 2008, 2008 11th International Conference on Computer and Information Technology.

[32]  Christoph H. Lampert,et al.  Semi-supervised Laplacian Regularization of Kernel Canonical Correlation Analysis , 2008, ECML/PKDD.

[33]  A. Zwinderman,et al.  Statistical Applications in Genetics and Molecular Biology Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis , 2011 .

[34]  Alfred O. Hero,et al.  A greedy approach to sparse canonical correlation analysis , 2008, 0801.2748.

[35]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[36]  John Shawe-Taylor,et al.  Convergence analysis of kernel Canonical Correlation Analysis: theory and practice , 2008, Machine Learning.

[37]  David Tritchler,et al.  Genome-wide sparse canonical correlation of gene expression with genotypes , 2007, BMC proceedings.

[38]  L. Soulère,et al.  Conjugated linoleic acid, unlike other unsaturated fatty acids, strongly induces glutathione synthesis without any lipoperoxidation , 2006, British Journal of Nutrition.

[39]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[40]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[41]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[42]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[43]  M. Rui Alves,et al.  Interpolative biplots applied to principal component analysis and canonical correlation analysis , 2003 .

[44]  Yoshihiro Yamanishi,et al.  Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis , 2003, ISMB.

[45]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[46]  M. Ringnér,et al.  Impact of DNA amplification on gene expression patterns in breast cancer. , 2002, Cancer research.

[47]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[48]  T. Ouarda,et al.  Regional flood frequency estimation with canonical correlation analysis , 2001 .

[49]  Horst Bischof,et al.  Nonlinear Feature Extraction Using Generalized Canonical Correlation Analysis , 2001, ICANN.

[50]  Johan A. K. Suykens,et al.  Kernel Canonical Correlation Analysis and Least Squares Support Vector Machines , 2001, ICANN.

[51]  H. Knutsson,et al.  Detection of neural activity in functional MRI using canonical correlation analysis , 2001, Magnetic resonance in medicine.

[52]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[53]  Colin Fyfe,et al.  A neural implementation of canonical correlation analysis , 1999, Neural Networks.

[54]  Olvi L. Mangasarian,et al.  Machine Learning via Polyhedral Concave Minimization , 1996 .

[55]  Harold P. Benson,et al.  Concave Minimization: Theory, Applications and Algorithms , 1995 .

[56]  William A. Gardner,et al.  Programmable canonical correlation analysis: a flexible framework for blind adaptive spatial filtering , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[57]  C. Heij,et al.  A modified canonical correlation approach to approximate state space modelling , 1991, [1991] Proceedings of the 30th IEEE Conference on Decision and Control.

[58]  Linda B. McGown,et al.  Canonical correlation technique for rank estimation of excitation-emission matrixes , 1989 .

[59]  Franklin T. Luk,et al.  Canonical Correlations And Generalized SVD: Applications And New Algorithms , 1989, Optics & Photonics.

[60]  J. T. Webster,et al.  Canonical Correlation as a Discriminant Tool in a Periodontal Problem , 1985 .

[61]  J. D. Stowe,et al.  A Canonical Correlation Analysis of Commercial Bank Asset/Liability Structures , 1983, Journal of Financial and Quantitative Analysis.

[62]  K. W. Wong,et al.  Erratum: Study of the mathematical approximations made in the basis-correlation method and those made in the canonical-transformation method for an interacting Bose gas , 1980 .

[63]  H. Vinod Canonical ridge and econometrics of joint production , 1976 .

[64]  Randall B. Dunham,et al.  Canonical Correlation Analysis in a Predictive System. , 1975 .

[65]  Mark Monmonier,et al.  IMPROVING THE INTERPRETATION OF GEOGRAPHICAL CANONICAL CORRELATION MODELS , 1973 .

[66]  J. Kettenring,et al.  Canonical Analysis of Several Sets of Variables , 2022 .

[67]  Hopkins Ce Statistical analysis by canonical correlation: a computer application. , 1969 .

[68]  C E Hopkins,et al.  Statistical analysis by canonical correlation: a computer application. , 1969, Health Services Research.

[69]  M. Healy A rotation method for computing canonical correlations , 1957 .

[70]  Frederick V. Waugh,et al.  Regressions between Sets of Variables , 1942 .

[71]  H. Hotelling The most predictable criterion. , 1935 .

[72]  HighWire Press Philosophical transactions of the Royal Society of London. Series A, Containing papers of a mathematical or physical character , 1896 .