Mining Gene Ontology Annotations From Hyperlinks in the Gene Wiki

The Gene Wiki is an informal collection of more than 10,000 Wikipedia articles about human genes. Through the continued contributions of many volunteers, it is continuously growing as a valuable repository of biomedical knowledge. While initial efforts were devoted to seeding Gene Wiki articles with data from public databases, we are now looking for ways to harvest the human-added knowledge accumulating in these articles. One of the sources of such potential knowledge are the links between Gene Wiki articles and articles describing biological concepts. Here, we assess the potential of such interwiki links to signal novel gene annotations. This analysis was performed by mapping the targets of Gene Wiki links to Gene Ontology terms and then comparing the resultant connections to known Gene Ontology annotations. We found a total of 12,828 potential annotations of which 5,005 (39%) correspond to existing annotations and 7,823 (61%) represent candidates for new gene annotations.

[1]  Jon W. Huss,et al.  BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources , 2009, Genome Biology.

[2]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[3]  Emily Dimmer,et al.  An evaluation of GO annotation retrieval for BioCreAtIvE and GOA , 2005, BMC Bioinformatics.

[4]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[5]  Jon W. Huss,et al.  A Gene Wiki for Community Annotation of Gene Function , 2008, PLoS biology.

[6]  Andrew I. Su,et al.  The Gene Wiki: community intelligence applied to human gene annotation , 2009, Nucleic Acids Res..

[7]  Rolf Apweiler,et al.  The Gene Ontology Annotation (GOA) Project—Application of GO in SWISS-PROT, TrEMBL and InterPro , 2003, Comparative and functional genomics.

[8]  H. Drabkin,et al.  A MOD(ern) perspective on literature curation , 2010, Molecular Genetics and Genomics.

[9]  Andrew I Su,et al.  Power-law-like distributions in biomedical publications and research funding , 2007, Genome Biology.

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[13]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.