Quantifying the biological significance of gene ontology biological processes - implications for the analysis of systems-wide data

MOTIVATION Gene Ontology (GO), the de facto standard for representing protein functional aspects, is being used beyond the primary goal for which it is designed: protein functional annotation. It is increasingly used to evaluate large sets of relationships between proteins, e.g. protein-protein interactions or mRNA co-expression, under the assumption that related proteins tend to have the same or similar GO terms. Nevertheless, this assumption only holds for terms representing functional groups with biological significance ('classes'), and not for the ones representing human-imposed aggregations or conceptualizations lacking a biological rationale ('categories'). RESULTS Using a data-driven approach based on a set of high-quality functional associations, we quantify the functional coherence of GO biological process (GO:BP) terms as well as their explicit and implicit relationships, trying to distinguish classes and categories. We show that the quantification used is in agreement with the distinction one would intuitively make between these two concepts. As not all GO:BP terms and relationships are equally supported by current functional associations, any detailed validation of new experimental data using GO:BP, beyond whole-system statistics, should take such unbalance into account. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[2]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[3]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[4]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[5]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[6]  M. Riley Systems for categorizing functions of gene products. , 1998, Current Opinion in Structural Biology.

[7]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[8]  O. Cohen-Fix,et al.  The metaphase to anaphase transition: a case of productive destruction. , 1999, European journal of biochemistry.

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  Yair Wand,et al.  A question of class , 2008, Nature.

[11]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[12]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[13]  M. Moran,et al.  Large-scale mapping of human protein–protein interactions by mass spectrometry , 2007, Molecular systems biology.

[14]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[15]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[16]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.