Bioinformatics Original Paper Improved Scoring of Functional Groups from Gene Expression Data by Decorrelating Go Graph Structure

Motivation: The result of a typical microarray experiment is a long list of genes with corresponding expression measurements. This list is only the starting point for a meaningful biological interpretation. Modern methods identify relevant biological processes or functions from gene expression data by scoring the statistical significance of prede-fined functional gene groups, e.g. based on Gene Ontology (GO). We develop methods that increase the explanatory power of this approach by integrating knowledge about relationships between the GO terms into the calculation of the statistical significance. Results: We present two novel algorithms that improve GO group scoring using the underlying GO graph topology. The algorithms are evaluated on real and simulated gene expression data. We show that both methods eliminate local dependencies between GO terms and point to relevant areas in the GO graph that remain undetected with state-of-the-art algorithms for scoring functional terms. A simulation study demonstrates that the new methods exhibit a higher level of detecting relevant biological terms than competing methods.

[1]  Martin Vingron,et al.  An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets , 2006, RECOMB.

[2]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[4]  Thomas Flohr,et al.  Distinct gene expression profiles determine molecular treatment response in childhood acute lymphoblastic leukemia. , 2005, Blood.

[5]  Robert Gentleman,et al.  A graph-theoretic approach to testing associations between disparate sources of functional genomics data , 2004, Bioinform..

[6]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[7]  R. Gentleman,et al.  Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. , 2004, Blood.

[8]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[9]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[10]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Toshiyuki Yamamoto,et al.  CONFLICT OF INTEREST: None declared. , 2013 .

[13]  Cliff Joslyn,et al.  The Gene Ontology Categorizer , 2004, ISMB/ECCB.

[14]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[15]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .