Multiple testing for gene sets from microarray experiments

BackgroundA key objective in many microarray association studies is the identification of individual genes associated with clinical outcome. It is often of additional interest to identify sets of genes, known a priori to have similar biologic function, associated with the outcome.ResultsIn this paper, we propose a general permutation-based framework for gene set testing that controls the false discovery rate (FDR) while accounting for the dependency among the genes within and across each gene set. The application of the proposed method is demonstrated using three public microarray data sets. The performance of our proposed method is contrasted to two other existing Gene Set Enrichment Analysis (GSEA) and Gene Set Analysis (GSA) methods.ConclusionsOur simulations show that the proposed method controls the FDR at the desired level. Through simulations and case studies, we observe that our method performs better than GSEA and GSA, especially when the number of prognostic gene sets is large.

[1]  Andrew B. Nobel,et al.  A statistical framework for testing functional categories in microarray data , 2008, 0803.3881.

[2]  Dan Nettleton,et al.  Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis , 2008, Bioinform..

[3]  M. Eriksson,et al.  ACAT2 and human hepatic cholesterol metabolism: identification of important gender-related differences in normolipidemic, non-obese Chinese patients. , 2009, Atherosclerosis.

[4]  Peter J. Park,et al.  A multivariate approach for integrating genome-wide expression data and biological knowledge , 2006, Bioinform..

[5]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[6]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[7]  G. Box Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems, I. Effect of Inequality of Variance in the One-Way Classification , 1954 .

[8]  Wei Zhang,et al.  Gene Set Enrichment Analyses Revealed Differences in Gene Expression Patterns between Males and Females , 2009, Silico Biol..

[9]  H. Ostrer,et al.  Loss of Mitogen-Activated Protein Kinase Kinase Kinase 4 (MAP3K4) Reveals a Requirement for MAPK Signalling in Mouse Sex Determination , 2009, PLoS biology.

[10]  T. Kodama,et al.  Determination of physiological plasma pentraxin 3 (PTX3) levels in healthy populations , 2009, Clinical chemistry and laboratory medicine.

[11]  U. Mansmann,et al.  Testing Differential Gene Expression in Functional Groups , 2005, Methods of Information in Medicine.

[12]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[13]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[14]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[15]  K. Langohr,et al.  Role of sex and time of blood sampling in SOD1 and SOD2 expression variability. , 2008, Clinical biochemistry.

[16]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[17]  I. Jonkers,et al.  X-changing information on X inactivation. , 2010, Experimental cell research.

[18]  L. J. Wei,et al.  The Robust Inference for the Cox Proportional Hazards Model , 1989 .

[19]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[20]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[21]  T. Mak,et al.  The impact of p53 and p73 on aneuploidy and cancer. , 2008, Trends in cell biology.

[22]  B Alex Merrick,et al.  NGF-mediated transcriptional targets of p53 in PC12 neuronal differentiation , 2007, BMC Genomics.

[23]  James J. Chen,et al.  Multivariate analysis of variance test for gene set analysis , 2009, Bioinform..

[24]  L. Carrel,et al.  Dosage compensation and gene expression on the mammalian X chromosome: one plus one does not always equal two , 2009, Chromosome Research.

[25]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[26]  H. Nakshatri,et al.  Negative regulation of chemokine receptor CXCR4 by tumor suppressor p53 in breast cancer cells: implications of p53 mutation or isoform expression on breast cancer cell invasion , 2007, Oncogene.

[27]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[28]  A. Fornace,et al.  Identification of an additional p53-responsive site in the human epidermal growth factor receptor gene promotor , 1997, Oncogene.

[29]  D. Cantrell,et al.  The Gtpase Rho Controls a P53-Dependent Survival Checkpoint during Thymopoiesis , 2000, The Journal of experimental medicine.

[30]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[31]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[32]  K. Funa,et al.  Kinetics of repression by modified p53 on the PDGF β‐receptor promoter , 2008, International journal of cancer.

[33]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[34]  David I. Warton,et al.  Penalized Normal Likelihood and Ridge Regularization of Correlation and Covariance Matrices , 2008 .

[35]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  John D. Storey A direct approach to false discovery rates , 2002 .

[37]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[38]  D.,et al.  Regression Models and Life-Tables , 2022 .

[39]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[40]  F. Grummt,et al.  Repression of interleukin-2 and interleukin-4 promoters by tumor suppressor protein p53. , 1996, Journal of interferon & cytokine research : the official journal of the International Society for Interferon and Cytokine Research.