论文信息 - dcGOR: An R Package for Analysing Ontologies and Protein Domain Annotations

dcGOR: An R Package for Analysing Ontologies and Protein Domain Annotations

I introduce an open-source R package ‘dcGOR’ to provide the bioinformatics community with the ease to analyse ontologies and protein domain annotations, particularly those in the dcGO database. The dcGO is a comprehensive resource for protein domain annotations using a panel of ontologies including Gene Ontology. Although increasing in popularity, this database needs statistical and graphical support to meet its full potential. Moreover, there are no bioinformatics tools specifically designed for domain ontology analysis. As an add-on package built in the R software environment, dcGOR offers a basic infrastructure with great flexibility and functionality. It implements new data structure to represent domains, ontologies, annotations, and all analytical outputs as well. For each ontology, it provides various mining facilities, including: (i) domain-based enrichment analysis and visualisation; (ii) construction of a domain (semantic similarity) network according to ontology annotations; and (iii) significance analysis for estimating a contact (statistical significance) network. To reduce runtime, most analyses support high-performance parallel computing. Taking as inputs a list of protein domains of interest, the package is able to easily carry out in-depth analyses in terms of functional, phenotypic and diseased relevance, and network-level understanding. More importantly, dcGOR is designed to allow users to import and analyse their own ontologies and annotations on domains (taken from SCOP, Pfam and InterPro) and RNAs (from Rfam) as well. The package is freely available at CRAN for easy installation, and also at GitHub for version control. The dedicated website with reproducible demos can be found at http://supfam.org/dcGOR.

Hai Fang | Hai Fang

[1] Hai Fang,et al. A domain-centric solution to functional genomics via dcGO Predictor , 2013, BMC Bioinformatics.

[2] Robert D. Finn,et al. Rfam: Wikipedia, clans and the “decimal” release , 2010, Nucleic Acids Res..

[3] M. Ashburner,et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[4] Alexandros Stamatakis,et al. A daily-updated tree of (sequenced) life as a reference for genome research , 2013, Scientific Reports.

[5] Léon Personnaz,et al. Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[6] Hai Fang,et al. dcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more , 2012, Nucleic Acids Res..

[7] Peng Sun,et al. Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering , 2014, Nucleic acids research.

[8] M. Ashburner,et al. Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9] Thomas Lengauer,et al. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[10] Phillip W. Lord,et al. Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..

[11] Robert D. Finn,et al. InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.