Gaining confidence in biological interpretation of the microarray data: the functional consistence of the significant GO categories

MOTIVATION In microarray studies, numerous tools are available for functional enrichment analysis based on GO categories. Most of these tools, due to their requirement of a prior threshold for designating genes as differentially expressed genes (DEGs), are categorized as threshold-dependent methods that often suffer from a major criticism on their changing results with different thresholds. RESULTS In the present article, by considering the inherent correlation structure of the GO categories, a continuous measure based on semantic similarity of GO categories is proposed to investigate the functional consistence (or stability) of threshold-dependent methods. The results from several datasets show when simply counting overlapping categories between two groups, the significant category groups selected under different DEG thresholds are seemingly very different. However, based on the semantic similarity measure proposed in this article, the results are rather functionally consistent for a wide range of DEG thresholds. Moreover, we find that the functional consistence of gene lists ranked by SAM metric behaves relatively robust against changing DEG thresholds. AVAILABILITY Source code in R is available on request from the authors.

[1]  Björn Nilsson,et al.  Threshold-free high-power methods for the ontological analysis of genome-wide gene-expression studies , 2007, Genome Biology.

[2]  Olivier Bodenreider,et al.  Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[3]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Alex Lewin,et al.  Grouping Gene Ontology terms to improve the assessment of gene set enrichment in microarray data , 2006, BMC Bioinformatics.

[5]  Jing Zhu,et al.  Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network , 2007, Bioinform..

[6]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[7]  Stanley N Cohen,et al.  Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[10]  Petri Törönen,et al.  Theme discovery from gene lists for identification and viewing of multiple functional groups , 2005, BMC Bioinformatics.

[11]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[12]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[13]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[14]  Holger Fröhlich,et al.  GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products , 2007, BMC Bioinformatics.

[15]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[16]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[17]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[18]  R. Verhaak,et al.  Prognostically useful gene-expression profiles in acute myeloid leukemia. , 2004, The New England journal of medicine.

[19]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[20]  Jing Zhu,et al.  Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network , 2007, Bioinform..

[21]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[22]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[23]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[24]  P. Khatri,et al.  Global functional profiling of gene expression ? ? This work was funded in part by a Sun Microsystem , 2003 .

[25]  Jing Zhu,et al.  Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules , 2006, Bioinform..

[26]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[27]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[28]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[29]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  D. Botstein,et al.  Gene expression patterns in human liver cancers. , 2002, Molecular biology of the cell.

[32]  Dong Wang,et al.  GO-2D: identifying 2-dimensional cellular-localized functional modules in Gene Ontology , 2007, BMC Genomics.

[33]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[34]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[35]  Hagai Bergman,et al.  Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression , 2005, Bioinform..

[36]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.