Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing

BackgroundA large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature.ResultsIn this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery.ConclusionsWe demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks.

[1]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[2]  Amar K. Das,et al.  Unsupervised Method for Automatic Construction of a Disease Dictionary from a Large Free Text Collection , 2008, AMIA.

[3]  J. DiMasi,et al.  Success rates for new drugs entering clinical testing in the United States , 1995, Clinical pharmacology and therapeutics.

[4]  Philip E. Bourne,et al.  Drug Discovery Using Chemical Systems Biology: Repositioning the Safe Medicine Comtan to Treat Multi-Drug and Extensively Drug Resistant Tuberculosis , 2009, PLoS Comput. Biol..

[5]  Ayfer Ali,et al.  The Major Role of Clinicians in the Discovery of Off‐Label Drug Therapies , 2006, Pharmacotherapy.

[6]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[7]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[8]  Guanghui Hu,et al.  Human Disease-Drug Network Based on Genomic Expression Profiles , 2009, PloS one.

[9]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[10]  Christopher S. G. Khoo,et al.  Automatic identification of treatment relations for medical ontology learning : an exploratory study , 2004 .

[11]  Yang Huang,et al.  Combining Text Classification and Hidden Markov Modeling Techniques for Structuring Randomized Clinical Trial Abstracts , 2006, AMIA.

[12]  Wanda Pratt,et al.  A Study of Biomedical Concept Identification: MetaMap vs. People , 2003, AMIA.

[13]  George Hripcsak,et al.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[14]  Daniel L. Rubin,et al.  Comparison of concept recognizers for building the Open Biomedical Annotator , 2009, BMC Bioinformatics.

[15]  Rong Xu,et al.  A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[16]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[17]  Hans-Peter Kriegel,et al.  Extraction of semantic biomedical relations from text using conditional random fields , 2008, BMC Bioinformatics.

[18]  Alexander A. Morgan,et al.  Computational Repositioning of the Anticonvulsant Topiramate for Inflammatory Bowel Disease , 2011, Science Translational Medicine.

[19]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[20]  A. Chiang,et al.  Systematic Evaluation of Drug–Disease Relationships to Identify Leads for Novel Drug Uses , 2009, Clinical pharmacology and therapeutics.

[21]  Pierre Zweigenbaum,et al.  Automatic Extraction of semantic relations between medical entities: Application to the treatment relation , 2010, Semantic Mining in Biomedicine.

[22]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[23]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[24]  T. Ashburn,et al.  Drug repositioning: identifying and developing new uses for existing drugs , 2004, Nature Reviews Drug Discovery.

[25]  G. Schneider,et al.  Predicting Compound Selectivity by Self‐Organizing Maps: Cross‐Activities of Metabotropic Glutamate Receptor Antagonists , 2006, ChemMedChem.

[26]  Rong Xu,et al.  A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text , 2012, J. Biomed. Informatics.

[27]  Zhiyong Lu,et al.  Automatic integration of drug indications from multiple health resources , 2010, IHI.

[28]  Joel Dudley,et al.  Exploiting drug-disease relationships for computational drug repositioning , 2011, Briefings Bioinform..

[29]  Neil R. Smalheiser,et al.  Proceedings of the 1st ACM International Health Informatics Symposium , 2010, IHI 2010.

[30]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[31]  J. Cimino,et al.  Automatic knowledge acquisition from MEDLINE. , 1993, Methods of information in medicine.

[32]  Bin Chen,et al.  PubChem as a Source of Polypharmacology , 2009, J. Chem. Inf. Model..