eGARD: Extracting associations between genomic anomalies and drug responses from text

Tumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for ‘best-fit’ therapies and readily generate hypotheses for new clinical trials.

[1]  Debasis Dash,et al.  HGVbaseG2P: a central genetic association database , 2008, Nucleic Acids Res..

[2]  Johan Vansteenkiste,et al.  Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma , 2009 .

[3]  Dongsheng Tu,et al.  K-ras mutations and benefit from cetuximab in advanced colorectal cancer. , 2008, The New England journal of medicine.

[4]  Bridget T. McInnes,et al.  Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies , 2012, J. Biomed. Informatics.

[5]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[6]  Cheng Zhang,et al.  Biomedical text mining and its applications in cancer research , 2013, J. Biomed. Informatics.

[7]  Cathy H. Wu,et al.  miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases , 2016, Journal of Biomedical Semantics.

[8]  Cathy H. Wu,et al.  pGenN, a Gene Normalization Tool for Plant Genes and Proteins in Scientific Literature , 2015, PloS one.

[9]  Zhiyong Lu,et al.  tmChem: a high performance approach for chemical named entity recognition and normalization , 2015, Journal of Cheminformatics.

[10]  Marti A. Hearst,et al.  A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text , 2002, Pacific Symposium on Biocomputing.

[11]  Chitta Baral,et al.  A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions , 2012, J. Biomed. Informatics.

[12]  S. Mohapatra,et al.  : DISEASE ONTOLOGY , 2014 .

[13]  Predrag Radivojac,et al.  MutDB: update on development of tools for the biochemical analysis of genetic variation , 2007, Nucleic Acids Res..

[14]  Yifan Peng,et al.  iSimp: A sentence simplification system for biomedicail text , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[15]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[16]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[17]  David Grimes,et al.  Randomized phase II trial of the efficacy and safety of trastuzumab combined with docetaxel in patients with human epidermal growth factor receptor 2-positive metastatic breast cancer administered as first-line treatment: the M77001 study group. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[18]  Zhiyong Lu,et al.  SR4GN: A Species Recognition Software Tool for Gene Normalization , 2012, PloS one.

[19]  Yael Garten,et al.  Recent progress in automatically extracting information from the pharmacogenomic literature. , 2010, Pharmacogenomics.

[20]  Xian Jin,et al.  Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints , 2015, Journal of Cheminformatics.

[21]  Ichiro Takemasa,et al.  First-line cetuximab-based chemotherapies for patients with advanced or metastatic KRAS wild-type colorectal cancer. , 2016, Molecular and clinical oncology.

[22]  Greg Yothers,et al.  Trastuzumab plus adjuvant chemotherapy for operable HER2-positive breast cancer. , 2005, The New England journal of medicine.

[23]  W. Alkema,et al.  Application of text mining in the biomedical domain. , 2015, Methods.

[24]  Marilyn M. Li,et al.  Standards and Guidelines for the Interpretation and Reporting of Sequence Variants in Cancer: A Joint Consensus Recommendation of the Association for Molecular Pathology, American Society of Clinical Oncology, and College of American Pathologists. , 2017, The Journal of molecular diagnostics : JMD.

[25]  Rong Xu,et al.  A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text , 2012, J. Biomed. Informatics.

[26]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[27]  Olivier Bodenreider,et al.  A mutation-centric approach to identifying pharmacogenomic relations in text , 2012, J. Biomed. Informatics.

[28]  Fabio Rinaldi,et al.  Relation mining experiments in the pharmacogenomics domain , 2012, J. Biomed. Informatics.

[29]  Jonathan R. Dry,et al.  Defining actionable mutations for oncology therapeutic development , 2016, Nature Reviews Cancer.

[30]  L Horn,et al.  My Cancer Genome: Web-based clinical decision support for genome-directed lung cancer treatment. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[31]  D. Planchard,et al.  Next-Generation EGFR Tyrosine Kinase Inhibitors for Treating EGFR-Mutant Lung Cancer beyond First Line , 2017, Front. Med..

[32]  Thierry Soussi,et al.  UMD (Universal Mutation Database): 2005 update , 2005, Human mutation.

[33]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[34]  Steven J. M. Jones,et al.  CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer , 2017, Nature Genetics.

[35]  K. E. Ravikumar,et al.  A Biological Named Entity Recognizer , 2002, Pacific Symposium on Biocomputing.

[36]  K. Bretonnel Cohen,et al.  The state of the art in text mining and natural language processing for pharmacogenomics , 2012, J. Biomed. Informatics.

[37]  R. Altman,et al.  Pharmacogenomics Knowledge for Personalized Medicine , 2012, Clinical pharmacology and therapeutics.

[38]  David Cameron,et al.  11 years' follow-up of trastuzumab after adjuvant chemotherapy in HER2-positive early breast cancer: final analysis of the HERceptin Adjuvant (HERA) trial , 2017, The Lancet.

[39]  Heidi L Rehm,et al.  ClinGen--the Clinical Genome Resource. , 2015, The New England journal of medicine.

[40]  Zhiyong Lu,et al.  GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains , 2015, BioMed research international.

[41]  Subha Madhavan,et al.  Quantification and expert evaluation of evidence for chemopredictive biomarkers to personalize cancer treatment , 2016, Oncotarget.

[42]  Russ B. Altman,et al.  Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text , 2009, BMC Bioinformatics.

[43]  Daniel J. Freeman,et al.  Wild-type KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[44]  P. Ascierto,et al.  Combined vemurafenib and cobimetinib in BRAF-mutated melanoma. , 2014, The New England journal of medicine.

[45]  Daniel J. Crichton,et al.  A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) , 2014, Database J. Biol. Databases Curation.

[46]  Zhiyong Lu,et al.  Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature , 2016, J. Am. Medical Informatics Assoc..

[47]  Dirk Schadendorf,et al.  Safety and efficacy of vemurafenib in BRAF(V600E) and BRAF(V600K) mutation-positive melanoma (BRIM-3): extended follow-up of a phase 3, randomised, open-label study. , 2014, The Lancet. Oncology.

[48]  Raja Mazumder,et al.  DiMeX: A Text Mining System for Mutation-Disease Association Extraction , 2016, PloS one.

[49]  Zhiyong Lu,et al.  Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine , 2016, PLoS Comput. Biol..

[50]  Val Gebski,et al.  Risk of Treatment‐Related Toxicities from EGFR Tyrosine Kinase Inhibitors: A Meta‐analysis of Clinical Trials of Gefitinib, Erlotinib, and Afatinib in Advanced EGFR‐Mutated Non–Small Cell Lung Cancer , 2017, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[51]  Zhiyong Lu,et al.  tmVar: a text mining approach for extracting sequence variants in biomedical literature , 2013, Bioinform..

[52]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[53]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..