Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis

Biological data produced by high throughput technologies are becoming more and more abundant and are arousing many statistical questions. This paper addresses one of them; when gene expression data are jointly observed with other variables with the purpose of highlighting significant relationships between gene expression and these other variables. One relevant statistical method to explore these relationships is Canonical Correlation Analysis (CCA). Unfortunately, in the context of postgenomic data, the number of variables (gene expressions) is usually greater than the number of units (samples) and CCA cannot be directly performed: a regularized version is required. We applied regularized CCA on data sets from two different studies and show that its interpretation evidences both previously validated relationships and new hypothesis. From the first data sets (nutrigenomic study), we generated interesting hypothesis on the transcription factor pathways potentially linking hepatic fatty acids and gene expression. From the second data sets (pharmacogenomic study on the NCI-60 cancer cell line panel), we identified new ABC transporter candidate substrates which relevancy is illustrated by the concomitant identification of several known substrates. In conclusion, the use of regularized CCA is likely to be relevant to a number and a variety of biological experiments involving the generation of high throughput data. We demonstrated here its ability to enhance the range of relevant conclusions that can be drawn from these relatively expensive experiments.

[1]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[2]  C. Barbacioru,et al.  Correlating gene expression with chemical scaffolds of cytotoxic agents: ellipticines as substrates and inhibitors of MDR1 , 2005, The Pharmacogenomics Journal.

[3]  Guy Perrière,et al.  Cross-platform comparison and visualisation of gene expression data using co-inertia analysis , 2003, BMC Bioinformatics.

[4]  S. Geer,et al.  Regularization in statistics , 2006 .

[5]  D. Jump,et al.  Fatty acid regulation of hepatic gene transcription. , 2005, The Journal of nutrition.

[6]  J. Byrd,et al.  Efflux of Depsipeptide FK228 (FR901228, NSC-630176) Is Mediated by P-Glycoprotein and Multidrug Resistance-Associated Protein 1 , 2005, Journal of Pharmacology and Experimental Therapeutics.

[7]  C. Braak,et al.  Interpreting canonical correlation analysis through biplots of structure correlations and weights , 1990 .

[8]  M. Gottesman,et al.  Targeting multidrug resistance in cancer , 2006, Nature Reviews Drug Discovery.

[9]  G. Parmigiani,et al.  The Analysis of Gene Expression Data , 2003 .

[10]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[11]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[12]  K. Ueda,et al.  Human MDR1 and MRP1 Recognize Berberine as Their Transport Substrate , 2007, Bioscience, biotechnology, and biochemistry.

[13]  D. Ma,et al.  Knock-down of P-glycoprotein reverses taxol resistance in ovarian cancer multicellular spheroids. , 2007, Oncology reports.

[14]  G. Peters,et al.  MRP3, an organic anion transporter able to transport anti-cancer drugs. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Zhe-Sheng Chen,et al.  MRP8, ATP-binding Cassette C11 (ABCC11), Is a Cyclic Nucleotide Efflux Pump and a Resistance Factor for Fluoropyrimidines 2′,3′-Dideoxycytidine and 9′-(2′-Phosphonylmethoxyethyl)adenine* , 2003, Journal of Biological Chemistry.

[16]  M. Delaforge,et al.  Characterization of two pharmacophores on the multidrug transporter P-glycoprotein. , 2002, Molecular pharmacology.

[17]  K. Paull,et al.  P-glycoprotein substrates and antagonists cluster into two distinct groups. , 1997, Molecular pharmacology.

[18]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[19]  J. Friedman Regularized Discriminant Analysis , 1989 .

[20]  R. Arceci,et al.  P-glycoprotein mediates profound resistance to bisantrene. , 1994, Oncology research.

[21]  Pietro Liò,et al.  MotifScorer: using a compendium of microarrays to identify regulatory motifs , 2007, Bioinform..

[22]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[23]  J. Schneider,et al.  "New" hepatic fat activates PPARalpha to maintain glucose, lipid, and cholesterol homeostasis. , 2005, Cell metabolism.

[24]  M. Grever,et al.  Rhodamine efflux patterns predict P-glycoprotein substrates in the National Cancer Institute drug screen. , 1994, Molecular pharmacology.

[25]  Alain Baccini,et al.  Stratégies pour l'analyse statistique de données transcriptomiques , 2005 .

[26]  William R Sellers,et al.  TSC2 regulates VEGF through mTOR-dependent and -independent pathways. , 2003, Cancer cell.