Mining gene-centric relationships from literature: the roles of gene mutation and gene expression in supporting drug discovery

Identifying drug target candidates is an important task for early development throughout the drug discovery process. This process is supported by the development of new high-throughput technologies that enable better understanding of disease mechanism. It becomes critical to facilitate effective analysis of the large amount of biological data. However, with much of the biological knowledge represented in the literature in the form of natural text, analysis and interpretation of high-throughput data has not reached its potential effectiveness. In this paper, we describe our solution in employing text mining as a technique in finding scientific information for target and biomarker discovery from the biomedical literature. Our approach utilises natural language processing techniques to capture linguistic patterns for the extraction of biological knowledge from text. Additionally, we discuss how the extracted knowledge is used for the analysis of biological data such as next-generation sequencing and gene expression data.

[1]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[2]  Peter Murray-Rust,et al.  High-Throughput Identification of Chemistry in Life Science Texts , 2006, CompLife.

[3]  Hans-Peter Lenhof,et al.  The Roche Cancer Genome Database 2.0 , 2011, BMC Medical Genomics.

[4]  Kanagasabai Rajaraman,et al.  Algorithm for Grounding Mutation Mentions from Text to Protein Sequences , 2010, DILS.

[5]  Jan Küntzer,et al.  The Roche Cancer Genome Database (RCGDB) , 2010, Human mutation.

[6]  Chi Zhang,et al.  TiSGeD: a database for tissue-specific genes , 2010, Bioinform..

[7]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[8]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[9]  Jiang Qian,et al.  TiGER: A database for tissue-specific gene expression and regulation , 2008, BMC Bioinformatics.

[10]  Concetto Spampinato,et al.  Combining literature text mining with microarray data: advances for system biology modeling , 2012, Briefings Bioinform..

[11]  Michael Schroeder,et al.  Improved mutation tagging with gene identifiers applied to membrane protein stability prediction , 2009, BMC Bioinformatics.

[12]  Christoph Steinbeck,et al.  Chemical Entities of Biological Interest: an update , 2009, Nucleic Acids Res..

[13]  Josef Scheiber,et al.  How can we enable drug discovery informatics for personalized healthcare? , 2011, Expert opinion on drug discovery.

[14]  Gary D. Bader,et al.  Pathguide: a Pathway Resource List , 2005, Nucleic Acids Res..