Research Paper: A System for Automated Lexical Mapping

Objective: To automate the mapping of disparate databases to standardized medical vocabularies. Background: Merging of clinical systems and medical databases, or aggregation of information from disparate databases, frequently requires a process whereby vocabularies are compared and similar concepts are mapped. Design: Using a normalization phase followed by a novel alignment stage inspired by DNA sequence alignment methods, automated lexical mapping can map terms from various databases to standard vocabularies such as the UMLS (Unified Medical Language System) and LOINC (Logical Observation Identifier Names and Codes). Measurements: This automated lexical mapping was evaluated using three real-world laboratory databases from different health care institutions. The authors report the sensitivity, specificity, percentage correct (true positives plus true negatives divided by total number of terms), and true positive and true negative rates as measures of system performance. Results: The alignment algorithm was able to map 57% to 78% (average of 63% over all runs and databases) of equivalent concepts through lexical mapping alone. True positive rates ranged from 18% to 70%; true negative rates ranged from 5% to 52%. Conclusion: Lexical mapping can facilitate the integration of data from diverse sources and decrease the time and cost required for manual mapping and integration of clinical systems and medical databases.

[1]  Naomi Sager,et al.  Research Paper: Natural Language Processing and the Representation of Clinical Data , 1994, J. Am. Medical Informatics Assoc..

[2]  Alexander Pertsemlidis,et al.  Having a BLAST with bioinformatics (and avoiding BLASTphemy) , 2001, Genome Biology.

[3]  R A Rocha,et al.  Automated translation between medical vocabularies using a frame-based interlingua. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[4]  G Hripcsak,et al.  Natural language processing and its future in medicine. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[5]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[6]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[7]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[8]  C. McDonald,et al.  LOINC, a universal standard for identifying laboratory observations: a 5-year update. , 2003, Clinical chemistry.

[9]  Allen C. Browne,et al.  UMLS knowledge for biomedical language processing. , 1993, Bulletin of the Medical Library Association.

[10]  Thomas H. Payne,et al.  Mapping to MeSH: The Art of Trapping MeSH Equivalence from within Narrative Text , 1988 .

[11]  A. A. Knecht EVALUATION OF A , 1972 .

[12]  A T McCray,et al.  The Nature of Lexical Knowledge , 1998, Methods of Information in Medicine.

[13]  Allen C. Browne,et al.  Evaluating lexical variant generation to improve information retrieval , 1998, AMIA.

[14]  Peter Spyns Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.

[15]  George Hripcsak,et al.  Mapping abbreviations to full forms in biomedical articles. , 2002, Journal of the American Medical Informatics Association : JAMIA.

[16]  Alexa T. McCray,et al.  Research Paper: Evaluating the Coverage of Controlled Health Data Terminologies: Report on the Results of the NLM/AHCPR Large Scale Vocabulary Test , 1997, J. Am. Medical Informatics Assoc..

[17]  R A Rocha,et al.  Using digrams to map controlled medical vocabularies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[18]  Martin Romacker,et al.  MedSynDikate - a natural language system for the extraction of medical information from findings reports , 2002, Int. J. Medical Informatics.

[19]  Alexa T. McCray,et al.  Conducting the NLM/AHCPR Large Scale Vocabulary Test: a distributed Internet-based experiment , 1997, AMIA.

[20]  Stefan Schulz,et al.  Subword segmentation-leveling out morphological variations for medical document retrieval , 2001, AMIA.

[21]  G O Barnett,et al.  Automated translation between medical terminologies using semantic definitions. , 1990, M.D. computing : computers in medical practice.

[22]  Yao Sun,et al.  Methods for automated concept mapping between medical databases , 2004, J. Biomed. Informatics.

[23]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[24]  Stanley M. Huff,et al.  Research Paper: Evaluation of a "Lexically Assign, Logically Refine" Strategy for Semi-automated Integration of Overlapping Terminologies , 1998, J. Am. Medical Informatics Assoc..

[25]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[26]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[27]  J J Cimino,et al.  Mapping clinically useful terminology to a controlled medical vocabulary. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[28]  Kate Johnson,et al.  A method for the automated mapping of laboratory results to LOINC , 2000, AMIA.

[29]  George Hripcsak,et al.  Automated encoding of clinical documents based on natural language processing. , 2004, Journal of the American Medical Informatics Association : JAMIA.

[30]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[31]  Russ B. Altman,et al.  Research Paper: Creating an Online Dictionary of Abbreviations from MEDLINE , 2002, J. Am. Medical Informatics Assoc..

[32]  Martin Romacker,et al.  Discourse structures in medical reports - Watch out! The generation of referentially coherent and valid text knowledge bases in the medSYNDIKATE system , 1999, Int. J. Medical Informatics.

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  L. Tick,et al.  Medical Language Processing: Applications to Patient Data Representation and Automatic Encoding , 1995, Methods of Information in Medicine.

[35]  Jerry R. Hobbs Information extraction from biomedical text , 2002, J. Biomed. Informatics.

[36]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..