Sparse canonical correlation analysis applied to ‐omics studies for integrative analysis and biomarker discovery

With the rapid development of new ‐omics measurement methods, there is an increasing interest in studying the correlation structure between two or more data sets. Multivariate methods such as canonical correlation analysis (CCA) have been proposed to analyze the intrinsic correlation relationship by integrating two data sets. However, because of the high dimensionality of data and the relative scarcity of samples, the ordinary CCA is usually faced with variable selection problems and thereby fails to obtain a satisfactory relationship. Here, we explored the potential of sparse CCA (SCCA) to find the correlative components in two sparse views. SCCA aims at finding sparse projection directions to well extract the correlation between two data sets. We applied this method to one simulation data and one real ‐omics data to illustrate the performance of SCCA. The results from two studies show that SCCA could effectively find the correlated patterns between two data sets, which are of high importance for understanding the relationship between two underlying chemical or biological processes. The corresponding variable subsets selected by sparse weight vectors can assist in a better interpretation of the chemical or biological process. The integrative analysis from two views by SCCA helps in improving the discriminative ability of classification models for various ‐omics studies. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  H. Zou,et al.  Regression Shrinkage and Selection via the Elastic Net , with Applications to Microarrays , 2003 .

[2]  T. Veenstra,et al.  Analytical and statistical approaches to metabolomics research. , 2009, Journal of separation science.

[3]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[4]  Charlotte Soneson,et al.  Integrative analysis of gene expression and copy number alterations using canonical correlation analysis , 2010, BMC Bioinformatics.

[5]  Carlos R Plata-Salamán,et al.  Biomarker discovery and validation: technologies and integrative approaches. , 2004, Trends in biotechnology.

[6]  Dong-Sheng Cao,et al.  A new strategy of exploring metabolomics data using Monte Carlo tree. , 2011, The Analyst.

[7]  Aeilko H. Zwinderman,et al.  Correlating multiple SNPs and multiple disease phenotypes: penalized non-linear canonical correlation analysis , 2009, Bioinform..

[8]  Philippe Besse,et al.  Sparse canonical methods for biological data integration: application to a cross-platform study , 2009, BMC Bioinformatics.

[9]  Ronald J. Moore,et al.  Integrative Analysis of the Mitochondrial Proteome in Yeast , 2004, PLoS biology.

[10]  Johan Trygg,et al.  Chemometrics in metabolomics--a review in human disease diagnosis. , 2010, Analytica chimica acta.

[11]  Younghoon Kim,et al.  Combining tissue transcriptomics and urine metabolomics for breast cancer biomarker identification , 2009, Bioinform..

[12]  David S. Wishart,et al.  MetaboAnalyst: a web server for metabolomic data analysis and interpretation , 2009, Nucleic Acids Res..

[13]  Dong-Sheng Cao,et al.  Interpretation of type 2 diabetes mellitus relevant GC-MS metabolomics fingerprints by using random forests , 2013 .

[14]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[15]  Yizeng Liang,et al.  Tree-based ensemble methods and their applications in analytical chemistry , 2012 .

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Dongsheng Cao,et al.  Metabolic alterations of impaired fasting glucose by GC/MS based plasma metabolic profiling combined with chemometrics , 2010, Metabolomics.

[18]  Desire L. Massart,et al.  Multiple factor analysis in environmental chemistry , 2005 .

[19]  Johan Trygg,et al.  Chemometrics in Metabolomics — An Introduction , 2006 .

[20]  Yizeng Liang,et al.  A novel kernel Fisher discriminant analysis: constructing informative kernel by decision tree ensemble for metabolomics data analysis. , 2011, Analytica chimica acta.

[21]  Pascal G. P. Martin,et al.  Novel aspects of PPARα‐mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study , 2007, Hepatology.