Drug Normalization for Cancer Therapeutic and Druggable Genome Target Discovery

Heterogeneous drug data representation among different druggable genome knowledge resources and datasets delays effective cancer therapeutic target discovery within the broad scientific community. The objective of the present paper is to describe the challenges and lessons learned from our efforts in developing and evaluating a standards-based drug normalization framework targeting cancer druggable genome datasets. Our findings suggested that mechanisms need to be established to deal with spelling errors and irregularities in normalizing clinical drug data in The Cancer Genome Atlas (TCGA), whereas the annotations from NCI Thesaurus (NCIt) and PubChem are two layers of normalization that potentially bridge between the clinical phenotypes and the druggable genome knowledge for effective cancer therapeutic target discovery.

[1]  Roger A. Sayle,et al.  Improved Chemical Text Mining of Patents with Infinite Dictionaries and Automatic Spelling Correction , 2012, J. Chem. Inf. Model..

[2]  Sorel Muresan,et al.  Comparing the Chemical Structure and Protein Content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database , 2013, Molecular informatics.

[3]  Cui Tao,et al.  Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[4]  A. Hopkins,et al.  The druggable genome , 2002, Nature Reviews Drug Discovery.

[5]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[6]  Olivier Bodenreider,et al.  An approximate matching method for clinical drug names. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[7]  Stuart J. Nelson,et al.  Normalized names for clinical drugs: RxNorm at 6 years , 2011, J. Am. Medical Informatics Assoc..

[8]  Kei-Hoi Cheung,et al.  Linking Open Drug Data , 2009, I-SEMANTICS.

[9]  Bin Chen,et al.  Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data , 2010, BMC Bioinformatics.

[10]  S. Lampel,et al.  The druggable genome: an update. , 2005, Drug discovery today.

[11]  Hongfang Liu,et al.  Research and applications: MedXN: an open source medication extraction and normalization tool for clinical text , 2014, J. Am. Medical Informatics Assoc..

[12]  M. Rask-Andersen,et al.  The druggable genome: Evaluation of drug targets in clinical trials suggests major shifts in molecular class and indication. , 2014, Annual review of pharmacology and toxicology.

[13]  Joshua F. McMichael,et al.  DGIdb - Mining the druggable genome , 2013, Nature Methods.

[14]  Li Zhou,et al.  Mapping Partners Master Drug Dictionary to RxNorm using an NLP-based approach , 2012, J. Biomed. Informatics.

[15]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[16]  Gilberto Fragoso,et al.  The NCI Thesaurus quality assurance life cycle , 2009, J. Biomed. Informatics.

[17]  Christopher G. Chute,et al.  Implementation Brief: LexGrid: A Framework for Representing, Storing, and Querying Biomedical Terminologies from Simple to Sublime , 2009, J. Am. Medical Informatics Assoc..

[18]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.