Mapping of biomedical text to concepts of lexicons, terminologies, and ontologies.

Concept mapping is a fundamental task in biomedical text mining in which textual mentions of concepts of interest are annotated with specific entries of lexicons, terminologies, ontologies, or databases representing these concepts. Though there has been a significant amount of research, there are still a limited number of practical, publicly available tools for concept mapping of biomedical text specified by the user as an independent task. In this chapter, several tools that can automatically map biomedical text to concepts from a wide range of terminological resources are presented, followed by those that can map to more restricted sets of these resources. This presentation is intended to serve as a guide to researchers without a background in biomedical concept mapping of text for the selection of an appropriate tool based on usability, scalability, configurability, balance between precision and recall, and the desired set of terminological resources with which to annotate the text. Only with effective automatic concept-mapping tools will systems be able to scalably analyze the biomedical literature and other large sets of documents as a fundamental part of more complex text-mining tasks such as information extraction and hypothesis evaluation and generation.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  P L Schuyler,et al.  The UMLS Metathesaurus: representing different views of biomedical concepts. , 1993, Bulletin of the Medical Library Association.

[3]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[4]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[5]  Udo Hahn,et al.  High-performance gene name normalization with GENO , 2009, Bioinform..

[6]  Olivier Bodenreider,et al.  The lexical properties of the gene ontology , 2002, AMIA.

[7]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[8]  Christopher G. Chute,et al.  Viewpoint: Clinical Classification and Terminology: Some History and Current Observations , 2000, J. Am. Medical Informatics Assoc..

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  David W. Weisgerber,et al.  Chemical Abstracts Service Chemical Registry System: history, scope, and impacts , 1997 .

[11]  Anni Coden,et al.  The ConceptMapper Approach to Named Entity Recognition , 2010, LREC.

[12]  K. Bretonnel Cohen,et al.  Concept annotation in the CRAFT corpus , 2012, BMC Bioinformatics.

[13]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[14]  Elspeth A. Bruford,et al.  Genenames.org: the HGNC resources in 2013 , 2012, Nucleic Acids Res..

[15]  Daniel L. Rubin,et al.  Biomedical ontologies: a functional perspective , 2007, Briefings Bioinform..

[16]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[17]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[18]  Patricia Tomasulo,et al.  ChemIDplus-Super Source for Chemical and Drug Information , 2002, Medical reference services quarterly.

[19]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[20]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[21]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[22]  Erik M. van Mulligen,et al.  Using rule-based natural language processing to improve disease normalization in biomedical text , 2012, J. Am. Medical Informatics Assoc..

[23]  Egon L. Willighagen,et al.  OSCAR4: a flexible architecture for chemical text-mining , 2011, J. Cheminformatics.

[24]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[25]  Goran Nenadic,et al.  The GNAT library for local and remote gene mention normalization , 2011, Bioinform..

[26]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[27]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[28]  Lorraine K. Tanabe,et al.  GENETAG: a tagged corpus for gene/protein named entity recognition , 2005, BMC Bioinformatics.

[29]  Siegfried J. Pöppl,et al.  Biomedical Vocabularies - the Demand for Differentiation , 2007, MedInfo.

[30]  Olivier Bodenreider,et al.  Bio-ontologies: current trends and future directions , 2006, Briefings Bioinform..

[31]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[32]  Stephen B. Johnson,et al.  Research Paper: Topological Analysis of Large-scale Biomedical Terminology Structures , 2007, J. Am. Medical Informatics Assoc..

[33]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[34]  Hongfang Liu,et al.  Gene name ambiguity of eukaryotic nomenclatures , 2005, Bioinform..

[35]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[36]  Graeme Hirst,et al.  Ontology and the Lexicon , 2004, Handbook on Ontologies.

[37]  Alexander A. Morgan,et al.  Rutabaga by any other name: extracting biological names , 2002, J. Biomed. Informatics.

[38]  K. Bretonnel Cohen,et al.  Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters , 2014, BMC Bioinformatics.

[39]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[40]  Daniel L. Rubin,et al.  Comparison of concept recognizers for building the Open Biomedical Annotator , 2009, BMC Bioinformatics.

[41]  Martijn J. Schuemie,et al.  Peregrine: Lightweight gene name normalization by dictionary lookup , 2007 .

[42]  S. Bryant,et al.  PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[43]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[44]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[45]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[46]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[47]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2006, Nucleic Acids Res..

[48]  Nicoletta Calzolari,et al.  A lexicon for biology and bioinformatics: the BOOTStrep experience , 2008, LREC.

[49]  Mark A. Musen,et al.  The Open Biomedical Annotator , 2009, Summit on translational bioinformatics.

[50]  Seán I O'Donoghue,et al.  Reflect: augmented browsing for the life scientist , 2009, Nature Biotechnology.

[51]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.