Separate enrichment analysis of pathways for up- and downregulated genes

Two strategies are often adopted for enrichment analysis of pathways: the analysis of all differentially expressed (DE) genes together or the analysis of up- and downregulated genes separately. However, few studies have examined the rationales of these enrichment analysis strategies. Using both microarray and RNA-seq data, we show that gene pairs with functional links in pathways tended to have positively correlated expression levels, which could result in an imbalance between the up- and downregulated genes in particular pathways. We then show that the imbalance could greatly reduce the statistical power for finding disease-associated pathways through the analysis of all-DE genes. Further, using gene expression profiles from five types of tumours, we illustrate that the separate analysis of up- and downregulated genes could identify more pathways that are really pertinent to phenotypic difference. In conclusion, analysing up- and downregulated genes separately is more powerful than analysing all of the DE genes together.

[1]  Yajie Wang,et al.  Using Functional Signatures to Identify Repositioned Drugs for Breast, Myelogenous Leukemia and Prostate Cancer , 2012, PLoS Comput. Biol..

[2]  Christian Pilarsky,et al.  Meta-analysis of microarray data on pancreatic cancer defines a set of commonly dysregulated genes , 2005, Oncogene.

[3]  M Vingron,et al.  Identification and Classification of Differentially Expressed Genes in Renal Cell Carcinoma by Expression Profiling on a Global Human 31 , 500-Element cDNA Array , 2001 .

[4]  David Botstein,et al.  SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data , 2003, Nucleic Acids Res..

[5]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[6]  Frank Emmert-Streib,et al.  Assessment Method for a Power Analysis to Identify Differentially Expressed Pathways , 2012, PloS one.

[7]  H. D. Brunk,et al.  A Comparison of Binomial Approximations to the Hypergeometric Distribution , 1968 .

[8]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[9]  Jing Zhu,et al.  GO-function: deriving biologically relevant functions from statistically significant functions , 2012, Briefings Bioinform..

[10]  Di Wu,et al.  ROAST: rotation gene set tests for complex microarray experiments , 2010, Bioinform..

[11]  Simo V. Zhang,et al.  A map of human cancer signaling , 2007, Molecular systems biology.

[12]  H. Brentani,et al.  Amyloid-β Oligomers Induce Differential Gene Expression in Adult Human Brain Slices* , 2012, The Journal of Biological Chemistry.

[13]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[14]  P. Massion,et al.  DNA copy number aberrations in small-cell lung cancer reveal activation of the focal adhesion pathway , 2010, Oncogene.

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Nicola J. Mulder,et al.  From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems , 2011, Bioinform..

[18]  Yunyan Gu,et al.  Extensive up-regulation of gene expression in cancer: the normalised use of microarray data. , 2012, Molecular bioSystems.

[19]  Miroslav Machala,et al.  Global gene expression changes in human embryonic lung fibroblasts induced by organic extracts from respirable air particles , 2012, Particle and Fibre Toxicology.

[20]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[21]  Jing Zhu,et al.  Gaining confidence in biological interpretation of the microarray data: the functional consistence of the significant GO categories , 2008, Bioinform..

[22]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[23]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Lang He,et al.  Revealing weak differential gene expressions and their reproducible functions associated with breast cancer metastasis , 2012, Comput. Biol. Chem..

[25]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[26]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[27]  M. Honda,et al.  Common transcriptional signature of tumor-infiltrating mononuclear inflammatory cells and peripheral blood mononuclear cells in hepatocellular carcinoma patients. , 2008, Cancer research.

[28]  Hiu Kiu,et al.  SOCS regulation of the JAK/STAT signalling pathway. , 2008, Seminars in cell & developmental biology.

[29]  G. Atwal,et al.  Altered tumor formation and evolutionary selection of genetic variants in the human MDM4 oncogene , 2009, Proceedings of the National Academy of Sciences.

[30]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[31]  P. Khatri,et al.  Profiling gene expression using onto-express. , 2002, Genomics.

[32]  Jeffrey T. Chang,et al.  A genomic strategy to elucidate modules of oncogenic pathway signaling networks. , 2009, Molecular cell.

[33]  Matthew D. Young,et al.  Gene ontology analysis for RNA-seq: accounting for selection bias , 2010, Genome Biology.

[34]  A. W. Kemp,et al.  Generalized Hypergeometric Distributions , 1956 .

[35]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[36]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[37]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[38]  M. Orešič,et al.  Pathways to the analysis of microarray data. , 2005, Trends in biotechnology.

[39]  Kenneth W. Haemer Double Scales are Dangerous , 1948 .

[40]  B. Olsson,et al.  Molecular Signature of Cardiomyocyte Clusters Derived from Human Embryonic Stem Cells , 2008, Stem cells.

[41]  R. Fisher,et al.  The Logic of Inductive Inference , 1935 .

[42]  Tatsuhiko Tsunoda,et al.  High-Risk Ovarian Cancer Based on 126-Gene Expression Signature Is Uniquely Characterized by Downregulation of Antigen Presentation Pathway , 2012, Clinical Cancer Research.

[43]  William Ritchie,et al.  Genes Regulated in Neurons Undergoing Transcription-dependent Apoptosis Belong to Signaling Pathways Rather than the Apoptotic Machinery* , 2005, Journal of Biological Chemistry.

[44]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[45]  Zheng Guo,et al.  Distinct Functional Patterns of Gene Promoter Hypomethylation and Hypermethylation in Cancer Genomes , 2012, PloS one.

[46]  Jean Paul Thiery,et al.  Focal adhesions: Structure and dynamics , 2000, Biology of the cell.

[47]  J. Roscoe,et al.  An Investigation of the Restraints with Respect to Sample Size Commonly Imposed on the Use of the Chi-Square Statistic , 1971 .

[48]  Björn Nilsson,et al.  Threshold-free high-power methods for the ontological analysis of genome-wide gene-expression studies , 2007, Genome Biology.

[49]  Frank Emmert-Streib,et al.  Pathway Analysis of Expression Data: Deciphering Functional Building Blocks of Complex Diseases , 2011, PLoS Comput. Biol..

[50]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[51]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[52]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[53]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..