Drug Name Recognition: Approaches and Resources

Drug name recognition (DNR), which seeks to recognize drug mentions in unstructured medical texts and classify them into pre-defined categories, is a fundamental task of medical information extraction, and is a key component of many medical relation extraction systems and applications. A large number of efforts have been devoted to DNR, and great progress has been made in DNR in the last several decades. We present here a comprehensive review of studies on DNR from various aspects such as the challenges of DNR, the existing approaches and resources for DNR, and possible directions.

[1]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[2]  Wei Ma,et al.  RxNorm: prescription for electronic drug information exchange , 2005, IT Professional.

[3]  Paloma Martínez,et al.  The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions , 2013, J. Biomed. Informatics.

[4]  Erik M. van Mulligen,et al.  Recognition of chemical entities: combining dictionary-based and grammar-based approaches , 2015, Journal of Cheminformatics.

[5]  Juliane Fluck,et al.  Identification of new drug classification terms in textual resources , 2007, ISMB/ECCB.

[6]  Hongfei Lin,et al.  Drug name recognition in biomedical texts: a machine-learning-based method. , 2014, Drug discovery today.

[7]  Luca Toldo,et al.  Challenges in mining the literature for chemical information , 2013 .

[8]  Xu Han,et al.  An integrated pharmacokinetics ontology and corpus for text mining , 2013, BMC Bioinformatics.

[9]  Geoff Gordon,et al.  Use of natural language programming to extract medication from unstructured electronic medical records. , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[10]  Laura Inés Furlong,et al.  The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships , 2012, J. Biomed. Informatics.

[11]  Peggy L. Peissig,et al.  Study of Effect of Drug Lexicons on Medication Extraction from Electronic Medical Records , 2004, Pacific Symposium on Biocomputing.

[12]  Juliane Fluck,et al.  Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[13]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[14]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[15]  Alfonso Valencia,et al.  CHEMDNER: The drugs and chemical names extraction challenge , 2015, Journal of Cheminformatics.

[16]  Isabel Segura-Bedmar,et al.  The 1st DDIExtraction-2011 challenge task: Extraction of Drug-Drug Interactions from biomedical texts , 2011 .

[17]  Naomie Salim,et al.  Chemical named entities recognition: a review on approaches and applications , 2014, Journal of Cheminformatics.

[18]  Xiaolong Wang,et al.  Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection , 2015, Comput. Math. Methods Medicine.

[19]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[20]  Feng Xu,et al.  Therapeutic target database update 2014: a resource for targeted therapeutics , 2013, Nucleic Acids Res..

[21]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[22]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[23]  Yung-Chun Chang,et al.  Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization , 2015, Journal of Cheminformatics.

[24]  Natalia Grabar,et al.  Linguistic approach for identification of medication names and related information in clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[25]  Paloma Martínez,et al.  Combining dictionaries and ontologies for drug name recognition in biomedical texts , 2013, DTMBIO '13.

[26]  David L. Reich,et al.  Extraction and Mapping of Drug Names from Free Text to a Standardized Nomenclature , 2007, AMIA.

[27]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[28]  Bruno Martins,et al.  ULisboa: Recognition and Normalization of Medical Concepts , 2015, SemEval@NAACL-HLT.

[29]  Alexander A. Morgan,et al.  Investigation of Unsupervised Pattern Learning Techniques for Bootstrap Construction of a Medical Treatment Lexicon , 2009, BioNLP@HLT-NAACL.

[30]  A. Valencia,et al.  Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications , 2011, Molecular informatics.

[31]  K. Bretonnel Cohen,et al.  U-Compare: A modular NLP workflow construction and evaluation system , 2011, IBM J. Res. Dev..

[32]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[33]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[34]  José Luís Oliveira,et al.  A modular framework for biomedical concept recognition , 2013, BMC Bioinformatics.

[35]  Fei Xia,et al.  A cascade of classifiers for extracting medication information from discharge summaries , 2011, J. Biomed. Semant..

[36]  Utpal Kumar Sikdar,et al.  Domain-independent Model for Chemical Compound and Drug Name Recognition , 2013 .

[37]  George Hripcsak,et al.  Extracting Structured Medication Event Information from Discharge Summaries , 2008, AMIA.

[38]  Mukta Majumder,et al.  A Novel Technique for Name Identification from Homeopathy Diagnosis Discussion Forum , 2012 .

[39]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[40]  Min Li,et al.  High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge , 2010, J. Am. Medical Informatics Assoc..

[41]  Zhiyong Lu,et al.  tmChem: a high performance approach for chemical named entity recognition and normalization , 2015, Journal of Cheminformatics.

[42]  Francesc Solsona,et al.  A tool for the identification of chemical entities ( CheNER-BioC ) , 2013 .

[43]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[44]  Isabel Segura-Bedmar,et al.  Drug name recognition and classification in biomedical texts. A case study outlining approaches underpinning automated systems. , 2008, Drug discovery today.

[45]  Bruce E. Bray,et al.  RxTerms - a drug interface terminology derived from RxNorm , 2008, AMIA.

[46]  Andre Lamurias,et al.  Chemical compound and drug name recognition using CRFs and semantic similarity based on ChEBI , 2013 .

[47]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[48]  W. Scott Spangler,et al.  Chemical Name Extraction Based on Automatic Training Data Generation and Rich Feature Set , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[49]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[50]  Daniel Sánchez-Cisneros,et al.  UEM-UC3M: An Ontology-based named entity recognition system for biomedical texts. , 2013, *SEMEVAL.

[51]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[52]  Joe Carthy,et al.  Medical Disorder Recognition with Structural Support Vector Machines , 2013, CLEF.

[53]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[54]  Pernille Warrer,et al.  Using text-mining techniques in electronic patient records to identify ADRs from medicine use. , 2012, British Journal of Clinical Pharmacology.

[55]  Jari Björne,et al.  UTurku: Drug Named Entity Recognition and Drug-Drug Interaction Extraction Using SVM Classification and Domain Knowledge , 2013, SemEval@NAACL-HLT.

[56]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[57]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[58]  Sophia Ananiadou,et al.  Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics , 2015, Journal of Cheminformatics.

[59]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[60]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[61]  Agata Filipowska,et al.  Lexicon-free and context-free drug names identification methods using hidden markov models and pointwise mutual information , 2012, DTMBIO '12.

[62]  Hua Xu,et al.  Recognizing and Encoding Discorder Concepts in Clinical Text using Machine Learning and Vector Space Model , 2013, CLEF.

[63]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[64]  Iryna Gurevych,et al.  Towards Enhanced Interoperability for Large HLT Systems : UIMA for NLP , 2008 .

[65]  Tolga Can,et al.  DBCHEM : A Database Query Based Solution for the Chemical Compound and Drug Name Recognition Task , 2013 .

[66]  K. Bretonnel Cohen,et al.  U-Compare: share and compare text mining tools with UIMA , 2009, Bioinform..

[67]  Sophia Ananiadou,et al.  Boosting drug named entity recognition using an aggregate classifier , 2015, Artif. Intell. Medicine.

[68]  Neal Lewis,et al.  SPOT the Drug! An Unsupervised Pattern Matching Method to Extract Drug Names from Very Large Clinical Corpora , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[69]  Martijn J. Schuemie,et al.  A dictionary to identify small molecules and drugs in free text , 2009, Bioinform..

[70]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[71]  Pierre Zweigenbaum,et al.  Text mining for pharmacovigilance: Using machine learning for drug name recognition and drug-drug interaction extraction and classification , 2015, J. Biomed. Informatics.

[72]  Jun'ichi Tsujii,et al.  Task-oriented Evaluation of Syntactic Parsers and Their Representations , 2008, ACL.

[73]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[74]  Daniel M. Lowe,et al.  LeadMine: a grammar and dictionary driven approach to entity recognition , 2015, Journal of Cheminformatics.

[75]  César de Pablo-Sánchez,et al.  Using a shallow linguistic kernel for drug-drug interaction extraction , 2011, J. Biomed. Informatics.

[76]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database's 10th year anniversary: update 2015 , 2014, Nucleic Acids Res..

[77]  Feng Liu,et al.  The pharmacogenetics and pharmacogenomics knowledge base: accentuating the knowledge , 2007, Nucleic Acids Res..

[78]  S. Bryant,et al.  PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[79]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[80]  Hui Yang,et al.  Automatic extraction of medication information from medical discharge summaries , 2010, J. Am. Medical Informatics Assoc..

[81]  Karin M. Verspoor,et al.  BioLemmatizer: a lemmatization tool for morphological processing of biomedical text , 2012, J. Biomed. Semant..

[82]  Francisco M. Couto,et al.  LASIGE: using Conditional Random Fields and ChEBI ontology , 2013, SemEval@NAACL-HLT.

[83]  José Luís Oliveira,et al.  A document processing pipeline for annotating chemical entities in scientific documents , 2015, Journal of Cheminformatics.

[84]  Domonkos Tikk,et al.  Improving textual medication extraction using combined conditional random fields and rule-based systems , 2010, J. Am. Medical Informatics Assoc..

[85]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[86]  Hai Zhao,et al.  A Unified Character-Based Tagging Framework for Chinese Word Segmentation , 2010, TALIP.

[87]  U. Leser,et al.  Extended Feature Set for Chemical Named Entity Recognition and Indexing , 2013 .

[88]  Richard Boyce,et al.  Using Natural Language Processing to Extract Drug-Drug Interaction Information from Package Inserts , 2012 .

[89]  Ulf Leser,et al.  WBI-NER: The impact of domain-specific features on the performance of identifying and classifying mentions of drugs , 2013, *SEMEVAL.

[90]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[91]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[92]  Francisco M. Couto,et al.  Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation , 2013, PloS one.

[93]  Xiaohui Liang,et al.  CHEMDNER system with mixed conditional random fields and multi-scale word clustering , 2015, Journal of Cheminformatics.

[94]  Paloma Martínez,et al.  SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013) , 2013, *SEMEVAL.