Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature

Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org

[1]  H. Chandler Database , 1985 .

[2]  Chandler Matthew Armstrong Aliases and Ambiguity: A case study of gene aliases, and implications for information curation and AI , 2010 .

[3]  Michael Schroeder,et al.  Inter-species normalization of gene mentions with GNAT , 2008, ECCB.

[4]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[5]  Lorraine K. Tanabe,et al.  Tagging gene and protein names in full text articles , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[6]  Y. Kokubo,et al.  Protein tyrosine kinase 2beta as a candidate gene for hypertension. , 2007, Pharmacogenetics and genomics.

[7]  Hongfang Liu,et al.  Gene name ambiguity of eukaryotic nomenclatures , 2005, Bioinform..

[8]  Michael Boehnke,et al.  Evaluation of genome-wide association study results through development of ontology fingerprints , 2009, Bioinform..

[9]  W. Jim Zheng,et al.  Signaling network prediction by the Ontology Fingerprint enhanced Bayesian network , 2012, BMC Systems Biology.

[10]  Kevin Bretonnel Cohen,et al.  Biomedical Natural Language Processing , 2014 .

[11]  Goran Nenadic,et al.  The GNAT library for local and remote gene mention normalization , 2011, Bioinform..

[12]  Anandita Rajpurohit,et al.  PIK3CA and AKT1 Mutations Have Distinct Effects on Sensitivity to Targeted Pathway Inhibitors in an Isogenic Luminal Breast Cancer Model System , 2013, Clinical Cancer Research.

[13]  Y. Kokubo,et al.  Protein tyrosine kinase 2&bgr; as a candidate gene for hypertension , 2007 .

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Jijun Tang,et al.  Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network , 2014, Nucleic acids research.

[16]  George Hripcsak,et al.  Gene symbol disambiguation using knowledge-based profiles , 2007, Bioinform..

[17]  Karol Sikora,et al.  Personalized cancer therapy. , 2005, Personalized medicine.

[18]  K. Sikora Personalized cancer therapy--the key to the future. , 2004, Pharmacogenomics.

[19]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).