论文信息 - Unraveling complex relationships between heterogeneous omics datasets using local principal components

Unraveling complex relationships between heterogeneous omics datasets using local principal components

There is a growing interest in studying the dependencies between multiple data sources. A common way to analyze the relationships between a pair of data sources based on their correlation is canonical correlation analysis (CCA) which seeks for linear combinations of all variables from each dataset which maximize the correlation between them. However, in high dimensional datasets, such as genomic data, where the number of variables exceeds the number of experimental units, CCA may not lead to meaningful information. Moreover, when collinearity exists in one or both the datasets, CCA may not be applicable. In this paper, we present a novel method to extract common features from a pair of data sources using local principal components and Kendalls ranking. The results show that the proposed method outperforms CCA in many scenarios and is more robust to noisy data. Moreover, meaningful results are obtained using the proposed method when the number of variables exceeds the number of observed units.

Farshad Fotouhi | Noor Alaydie | F. Fotouhi | Noor Alaydie

[1] Alfred O. Hero,et al. A greedy approach to sparse canonical correlation analysis , 2008, 0801.2748.

[2] H. Hotelling. Relations Between Two Sets of Variates , 1936 .

[3] Philippe Besse,et al. Sparse canonical methods for biological data integration: application to a cross-platform study , 2009, BMC Bioinformatics.

[4] D. Tritchler,et al. Sparse Canonical Correlation Analysis with Application to Genomic Data Integration , 2009, Statistical applications in genetics and molecular biology.

[5] John N Weinstein,et al. Predicting drug sensitivity and resistance: profiling ABC transporter genes in cancer cells. , 2004, Cancer cell.

[6] Pascal G. P. Martin,et al. Novel aspects of PPARα‐mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study , 2007, Hepatology.

[7] H. Abdi. The Kendall Rank Correlation Coefficient , 2007 .

[8] R. Gittins,et al. Canonical Analysis: A Review with Applications in Ecology , 1985 .

[9] Philippe Besse,et al. Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis , 2009 .

[10] S. Agarwal,et al. RANKING GENES BY RELEVANCE TO A DISEASE , 2009 .