Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome

BackgroundCurrently, a number of bioinformatics methods are available to generate appropriate lists of genes from a microarray experiment. While these lists represent an accurate primary analysis of the data, fewer options exist to contextualise those lists. The development and validation of such methods is crucial to the wider application of microarray technology in the clinical setting. Two key challenges in clinical bioinformatics involve appropriate statistical modelling of dynamic transcriptomic changes, and extraction of clinically relevant meaning from very large datasets.ResultsHere, we apply an approach to gene set enrichment analysis that allows for detection of bi-directional enrichment within a gene set. Furthermore, we apply canonical correlation analysis and Fisher's exact test, using plasma marker data with known clinical relevance to aid identification of the most important gene and pathway changes in our transcriptomic dataset. After a 28-day dietary intervention with high-CLA beef, a range of plasma markers indicated a marked improvement in the metabolic health of genetically obese mice. Tissue transcriptomic profiles indicated that the effects were most dramatic in liver (1270 genes significantly changed; p < 0.05), followed by muscle (601 genes) and adipose (16 genes). Results from modified GSEA showed that the high-CLA beef diet affected diverse biological processes across the three tissues, and that the majority of pathway changes reached significance only with the bi-directional test. Combining the liver tissue microarray results with plasma marker data revealed 110 CLA-sensitive genes showing strong canonical correlation with one or more plasma markers of metabolic health, and 9 significantly overrepresented pathways among this set; each of these pathways was also significantly changed by the high-CLA diet. Closer inspection of two of these pathways - selenoamino acid metabolism and steroid biosynthesis - illustrated clear diet-sensitive changes in constituent genes, as well as strong correlations between gene expression and plasma markers of metabolic syndrome independent of the dietary effect.ConclusionBi-directional gene set enrichment analysis more accurately reflects dynamic regulatory behaviour in biochemical pathways, and as such highlighted biologically relevant changes that were not detected using a traditional approach. In such cases where transcriptomic response to treatment is exceptionally large, canonical correlation analysis in conjunction with Fisher's exact test highlights the subset of pathways showing strongest correlation with the clinical markers of interest. In this case, we have identified selenoamino acid metabolism and steroid biosynthesis as key pathways mediating the observed relationship between metabolic health and high-CLA beef. These results indicate that this type of analysis has the potential to generate novel transcriptome-based biomarkers of disease.

[1]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[2]  I. Kohane,et al.  Absolute enrichment: gene set enrichment analysis for homeostatic systems , 2006, Nucleic acids research.

[3]  A. Diamond,et al.  Molecular mechanisms by which selenoproteins affect cancer risk and progression. , 2009, Biochimica et biophysica acta.

[4]  C. Sérougne,et al.  Effect of selenium deficiency on hepatic lipid and lipoprotein metabolism in the rat , 1997, British Journal of Nutrition.

[5]  Johan Lindberg,et al.  Correlation Network Analysis for Data Integration and Biomarker Selectionw , 2007 .

[6]  Jukka T Salonen,et al.  The metabolic syndrome and total and cardiovascular disease mortality in middle-aged men. , 2002, JAMA.

[7]  Jay D. Horton,et al.  Combined analysis of oligonucleotide microarray data from transgenic and knockout mice identifies direct SREBP target genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  H. Brunner Annual Review of Genomics and Human Genetics , 2001, European Journal of Human Genetics.

[9]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[10]  Ignacio González,et al.  integrOmics: an R package to unravel relationships between two omics datasets , 2009, Bioinform..

[11]  E. Noone,et al.  Antidiabetic Effects of cis-9, trans-11–Conjugated Linoleic Acid May Be Mediated via Anti-Inflammatory Effects in White Adipose Tissue , 2007, Diabetes.

[12]  L. Lind,et al.  Clinical value of the metabolic syndrome for long term prediction of total and cardiovascular mortality : prospective , population based cohort study , 2006 .

[13]  M. Devonald,et al.  Current opinion in clinical nutrition and metabolic care. , 2008, Current opinion in clinical nutrition and metabolic care.

[14]  BMC Bioinformatics , 2005 .

[15]  R. Turner,et al.  Homeostasis model assessment: insulin resistance and β-cell function from fasting plasma glucose and insulin concentrations in man , 1985, Diabetologia.

[16]  G. Minuk,et al.  Hyperhomocysteinemia induces hepatic cholesterol biosynthesis and lipid accumulation via activation of transcription factors. , 2005, American journal of physiology. Endocrinology and metabolism.

[17]  H. Vinod Canonical ridge and econometrics of joint production , 1976 .

[18]  David Tritchler,et al.  Genome-wide sparse canonical correlation of gene expression with genotypes , 2007, BMC proceedings.

[19]  T. Beccari,et al.  Sterol dependent regulation of human TM7SF2 gene expression: role of the encoded 3beta-hydroxysterol Delta14-reductase in human cholesterol biosynthesis. , 2006, Biochimica et biophysica acta.

[20]  Aeilko H. Zwinderman,et al.  Correlating multiple SNPs and multiple disease phenotypes: penalized non-linear canonical correlation analysis , 2009, Bioinform..

[21]  M. Laakso,et al.  The metabolic syndrome predicts cardiovascular mortality: a 13-year follow-up study in elderly non-diabetic Finns. , 2007, European heart journal.

[22]  C. García-Monzón,et al.  A subset of dysregulated metabolic and survival genes is associated with severity of hepatic steatosis in obese Zucker rats[S] , 2010, Journal of Lipid Research.

[23]  Ruan Elliott,et al.  Nutritional genomics. , 2002, BMJ.

[24]  H. Dashti,et al.  Selenium and Liver Cirrhosis , 1998, Molecular and Cellular Biochemistry.

[25]  G. Combs,et al.  Selenium and anticarcinogenesis: underlying mechanisms , 2008, Current opinion in clinical nutrition and metabolic care.

[26]  Rickard Sandberg,et al.  Improved precision and accuracy for microarrays using updated probe set definitions , 2007, BMC Bioinformatics.

[27]  Qi Liu,et al.  Improving gene set analysis of microarray data by SAM-GS , 2007, BMC Bioinformatics.

[28]  Y. Hu,et al.  Selenium deficiency impairs host innate immune response and induces susceptibility to Listeria monocytogenes infection , 2009, BMC Immunology.

[29]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[30]  V. Gladyshev,et al.  Loss of housekeeping selenoprotein expression in mouse liver modulates lipoprotein metabolism. , 2008, Biochemical and biophysical research communications.

[31]  M. Heimberg,et al.  The role of selenium in the secretion of very-low-density lipoprotein in the isolated perfused rat liver. , 1991, The Biochemical journal.

[32]  Fernando Costa,et al.  Diagnosis and management of the metabolic syndrome: an American Heart Association/National Heart, Lung, and Blood Institute scientific statement. , 2006, Current opinion in cardiology.

[33]  Zhen Jiang,et al.  Bioconductor Project Bioconductor Project Working Papers Year Paper Extensions to Gene Set Enrichment , 2013 .

[34]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[35]  H. Roche,et al.  Cis-9, trans-11-conjugated linoleic acid but not its precursor trans-vaccenic acid attenuate inflammatory markers in the human colonic epithelial cell line Caco-2 , 2008, British Journal of Nutrition.

[36]  Paul Zimmet,et al.  The metabolic syndrome—a new worldwide definition , 2005, The Lancet.

[37]  Jörg Rahnenführer,et al.  Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry, Sandrine Dudoit (2005): Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2009 .

[38]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[39]  H. Hotelling Relations Between Two Sets of Variates , 1936 .