Multiple testing on the directed acyclic graph of gene ontology

MOTIVATION Current methods for multiplicity adjustment do not make use of the graph structure of Gene Ontology (GO) when testing for association of expression profiles of GO terms with a response variable. RESULTS We propose a multiple testing method, called the focus level procedure, that preserves the graph structure of Gene Ontology (GO). The procedure is constructed as a combination of a Closed Testing procedure with Holm's method. It requires a user to choose a 'focus level' in the GO graph, which reflects the level of specificity of terms in which the user is most interested. This choice also determines the level in the GO graph at which the procedure has most power. We prove that the procedure strongly controls the family-wise error rate without any additional assumptions on the joint distribution of the test statistics used. We also present an algorithm to calculate multiplicity-adjusted P-values. Because the focus level procedure preserves the structure of the GO graph, it does not generally preserve the ordering of the raw P-values in the adjusted P-values. AVAILABILITY The focus level procedure has been implemented in the globaltest and GlobalAncova packages, both of which are available on www.bioconductor.org.

[1]  K. Gabriel,et al.  On closed testing procedures with special reference to ordered analysis of variance , 1976 .

[2]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[3]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[4]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[5]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[8]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[9]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[10]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[11]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[12]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[13]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[14]  Robert Gentleman,et al.  An extensible application for assembling annotation for genomic data , 2003, Bioinform..

[15]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[16]  Rainer Breitling,et al.  Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments , 2004, BMC Bioinformatics.

[17]  V. Arango,et al.  Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex , 2004, Neurochemical Research.

[18]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[19]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[20]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  U. Mansmann,et al.  Testing Differential Gene Expression in Functional Groups , 2005, Methods of Information in Medicine.

[22]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[23]  W. Hiddemann,et al.  Nucleophosmin gene mutations are predictors of favorable prognosis in acute myelogenous leukemia with a normal karyotype. , 2005, Blood.

[24]  Jelle J. Goeman,et al.  Testing association of a pathway with survival using gene expression data , 2005, Bioinform..

[25]  Martin Vingron,et al.  An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets , 2006, RECOMB.

[26]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[27]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[28]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[29]  Qi Liu,et al.  Improving gene set analysis of microarray data by SAM-GS , 2007, BMC Bioinformatics.

[30]  N. Meinshausen Hierarchical testing of variable importance , 2008 .

[31]  Ulrich Mansmann,et al.  GlobalANCOVA: exploration and assessment of gene group effects , 2008, Bioinform..

[32]  D. Yekutieli Hierarchical False Discovery Rate–Controlling Methodology , 2008 .