An Integrated Statistical Approach to Compare Transcriptomics Data Across Experiments: A Case Study on the Identification of Candidate Target Genes of the Transcription Factor PPARα

An effective strategy to elucidate the signal transduction cascades activated by a transcription factor is to compare the transcriptional profiles of wild type and transcription factor knockout models. Many statistical tests have been proposed for analyzing gene expression data, but most tests are based on pair-wise comparisons. Since the analysis of microarrays involves the testing of multiple hypotheses within one study, it is generally accepted that one should control for false positives by the false discovery rate (FDR). However, it has been reported that this may be an inappropriate metric for comparing data across different experiments. Here we propose an approach that addresses the above mentioned problem by the simultaneous testing and integration of the three hypotheses (contrasts) using the cell means ANOVA model. These three contrasts test for the effect of a treatment in wild type, gene knockout, and globally over all experimental groups. We illustrate our approach on microarray experiments that focused on the identification of candidate target genes and biological processes governed by the fatty acid sensing transcription factor PPARα in liver. Compared to the often applied FDR based across experiment comparison, our approach identified a conservative but less noisy set of candidate genes with same sensitivity and specificity. However, our method had the advantage of properly adjusting for multiple testing while integrating data from two experiments, and was driven by biological inference. Taken together, in this study we present a simple, yet efficient strategy to compare differential expression of genes across experiments while controlling for multiple hypothesis testing.

[1]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[2]  W. Wahli,et al.  Peroxisome proliferator-activated receptors: nuclear control of metabolism. , 1999, Endocrine reviews.

[3]  Philip N Benfey,et al.  Reconstructing regulatory network transitions. , 2011, Trends in cell biology.

[4]  G. Hooiveld,et al.  Exploration of PPAR functions by microarray technology--a paradigm for nutrigenomics. , 2007, Biochimica et biophysica acta.

[5]  P. Farnham,et al.  Genomic Approaches That Aid in the Identification of Transcription Factor Target Genes , 2004, Experimental biology and medicine.

[6]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[7]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[8]  James M. Ntambi,et al.  Polyunsaturated fatty acid regulation of gene expression , 2001, Journal of Molecular Neuroscience.

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[11]  L. Sanderson,et al.  Comprehensive Analysis of PPARα-Dependent Regulation of Hepatic Lipid Metabolism by Expression Profiling , 2007, PPAR research.

[12]  Michael Müller,et al.  Peroxisome Proliferator-Activated Receptor Alpha Target Genes , 2010, PPAR research.

[13]  W. Wahli,et al.  The peroxisome proliferator‐activated receptor α regulates amino acid metabolism , 2001, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[14]  L. Aravind,et al.  Methods to reconstruct and compare transcriptional regulatory networks. , 2009, Methods in molecular biology.

[15]  S. Kersten,et al.  Nutrigenomics: goals and strategies , 2003, Nature Reviews Genetics.

[16]  B. Gregory,et al.  Whole-genome microarrays: applications and technical issues. , 2009, Methods in molecular biology.

[17]  Vincent Laudet,et al.  Overview of Nomenclature of Nuclear Receptors , 2006, Pharmacological Reviews.

[18]  A. Bracken,et al.  Transcriptomics: unravelling the biology of transcription factors and chromatin remodelers during development and differentiation. , 2009, Seminars in cell & developmental biology.

[19]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[20]  S. Kersten,et al.  Peroxisome proliferator-activated receptor α target genes , 2004, Cellular and Molecular Life Sciences CMLS.

[21]  I. Issemann,et al.  Activation of a member of the steroid hormone receptor superfamily by peroxisome proliferators , 1990, Nature.

[22]  John Quackenbush Microarray analysis and tumor classification. , 2006, The New England journal of medicine.

[23]  T. Willson,et al.  The PPARs: from orphan receptors to drug discovery. , 2000, Journal of medicinal chemistry.

[24]  Martin Vingron,et al.  Improved detection of overrepresentation of Gene-Ontology annotations with parent-child analysis , 2007, Bioinform..

[25]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[26]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[27]  I. Rusyn,et al.  Genomic Profiling in Nuclear Receptor-Mediated Toxicity , 2007, Toxicologic pathology.

[28]  T. Pineau,et al.  Targeted disruption of the alpha isoform of the peroxisome proliferator-activated receptor gene in mice results in abolishment of the pleiotropic effects of peroxisome proliferators , 1995, Molecular and cellular biology.

[29]  R. Heidstra,et al.  Microarray-based identification of transcription factor target genes. , 2011, Methods in molecular biology.

[30]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[31]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[32]  Martin Vingron,et al.  Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration , 2008, Bioinform..

[33]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[34]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[35]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[36]  G. Gibson,et al.  Analysis of variance of microarray data. , 2006, Methods in enzymology.

[37]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[38]  D. Koller,et al.  From signatures to models: understanding cancer using microarrays , 2005, Nature Genetics.

[39]  Eugene Kolker,et al.  A note on the false discovery rate and inconsistent comparisons between experiments , 2008, Bioinform..