An Empirical Evaluation of Resources for the Identification of Diseases and Adverse Effects in Biomedical Literature

The mentions of human health perturbations such as the diseases and adverse effects denote a special entity class in the biomedical literature. They help in understanding the underlying risk factors and develop a preventive rationale. The recognition of these named entities in texts through dictionary-based approaches relies on the availability of appropriate terminological resources. Although few resources are publicly available, not all are suitable for the text mining needs. Therefore, this work provides an overview of the well known resources with respect to human diseases and adverse effects such as the MeSH, MedDRA, ICD-10, SNOMED CT, and UMLS. Individual dictionaries are generated from these resources and their performance in recognizing the named entities is evaluated over a manually annotated corpus. In addition, the steps for curating the dictionaries, rule-based acronym disambiguation and their impact on the dictionary performance is discussed. The results show that the MedDRA and UMLS achieve the best recall. Besides this, MedDRA provides an additional benefit of achieving a higher precision. The combination of search results of all the dictionaries achieve a considerably high recall. The corpus is available on http://www.scai.fraunhofer.de/disease-ae-corpus.html

[1]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[2]  Gary H. Merrill,et al.  The MedDRA Paradox , 2008, AMIA.

[3]  Zhiyong Lu,et al.  Exploring Two Biomedical Text Genres for Disease Recognition , 2009, BioNLP@HLT-NAACL.

[4]  Alan J. Forster,et al.  Research Paper: Validation of a Discharge Summary Term Search Method to Detect Adverse Events , 2004, J. Am. Medical Informatics Assoc..

[5]  Howard L. Bleich,et al.  Technical Milestone: Medical Subject Headings Used to Search the Biomedical Literature , 2001, J. Am. Medical Informatics Assoc..

[6]  A Bate,et al.  Decision support methods for the detection of adverse events in post-marketing data. , 2009, Drug discovery today.

[7]  Richard Tzong-Han Tsai,et al.  UvA-DARE ( Digital Academic Repository ) Overview of BioCreative II gene mention recognition , 2008 .

[8]  Carlo Curino,et al.  Mining officially unrecognized side effects of drugs by combining web search and machine learning , 2005, CIKM '05.

[9]  A. Aronson Filtering the UMLS ® Metathesaurus ® for MetaMap 1999 , 1991 .

[10]  Martijn J. Schuemie,et al.  A dictionary to identify small molecules and drugs in free text , 2009, Bioinform..

[11]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[12]  T C Rindflesch,et al.  Ambiguity resolution while mapping free text to the UMLS Metathesaurus. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[13]  Ronald Cornet Definitions and qualifiers in SNOMED CT. , 2009, Methods of information in medicine.

[14]  A. Valencia,et al.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge , 2008, Genome Biology.

[15]  Juliane Fluck,et al.  The Autoimmune Disease Database: a dynamically compiled literature-derived database , 2006, BMC Bioinformatics.

[16]  Olivier Bodenreider,et al.  Evaluating UMLS strings for natural language processing , 2001, AMIA.

[17]  Isabel Segura-Bedmar,et al.  Drug name recognition and classification in biomedical texts. A case study outlining approaches underpinning automated systems. , 2008, Drug discovery today.

[18]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[19]  Allen C. Browne,et al.  UMLS language and vocabulary tools. , 2003, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[20]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[21]  Hanna Suominen,et al.  Mining of clinical and biomedical text and data: Editorial of the special issue , 2009, Int. J. Medical Informatics.

[22]  Mark Craven,et al.  Representing Sentence Structure in Hidden Markov Models for Information Extraction , 2001, IJCAI.

[23]  Bruce M Psaty,et al.  Detection, verification, and quantification of adverse drug reactions , 2004, BMJ : British Medical Journal.

[24]  Mph Dr. Syed Rizwanuddin Ahmad MD Adverse drug event monitoring at the food and drug administration , 2007, Journal of General Internal Medicine.

[25]  George Hripcsak,et al.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[26]  Teruyoshi Hishiki,et al.  Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning , 2005, Pacific Symposium on Biocomputing.

[27]  Dietrich Rebholz-Schuhmann,et al.  Assessment of disease named entity recognition on a corpus of annotated sentences , 2008, BMC Bioinformatics.