The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text

Interpretation of semantic propositions in free-text documents such as MEDLINE citations would provide valuable support for biomedical applications, and several approaches to semantic interpretation are being pursued in the biomedical informatics community. In this paper, we describe a methodology for interpreting linguistic structures that encode hypernymic propositions, in which a more specific concept is in a taxonomic relationship with a more general concept. In order to effectively process these constructions, we exploit underspecified syntactic analysis and structured domain knowledge from the Unified Medical Language System (UMLS). After introducing the syntactic processing on which our system depends, we focus on the UMLS knowledge that supports interpretation of hypernymic propositions. We first use semantic groups from the Semantic Network to ensure that the two concepts involved are compatible; hierarchical information in the Metathesaurus then determines which concept is more general and which more specific. A preliminary evaluation of a sample based on the semantic group Chemicals and Drugs provides 83% precision. An error analysis was conducted and potential solutions to the problems encountered are presented. The research discussed here serves as a paradigm for investigating the interaction between domain knowledge and linguistic structure in natural language processing, and could also make a contribution to research on automatic processing of discourse structure. Additional implications of the system we present include its integration in advanced semantic interpretation processors for biomedical text and its use for information extraction in specific domains. The approach has the potential to support a range of applications, including information retrieval and ontology engineering.

[1]  Ronald J. Brachman,et al.  What IS-A Is and Isn't: An Analysis of Taxonomic Links in Semantic Networks , 1983, Computer.

[2]  Barbara Rosario,et al.  The Descent of Hierarchy, and Selection in Relational Semantics , 2002, ACL.

[3]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[4]  P J Haug,et al.  Development and evaluation of a computerized admission diagnoses encoding system. , 1996, Computers and biomedical research, an international journal.

[5]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[6]  Peter J. Haug,et al.  MPLUS: a probabilistic medical language understanding system , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[7]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[8]  Olivier Bodenreider,et al.  Characterizing the definitions of anatomical concep ts in WordNet and specialized sources , 2002 .

[9]  Naomi Sager,et al.  Research Paper: Natural Language Processing and the Representation of Clinical Data , 1994, J. Am. Medical Informatics Assoc..

[10]  Barbara Rosario,et al.  Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy , 2001, EMNLP.

[11]  George Hripcsak,et al.  Coding Neuroradiology Reports for the Northern Manhattan Stroke Study: A Comparison of Natural Language Processing and Manual Review , 2000, Comput. Biomed. Res..

[12]  Alexa T. McCray,et al.  Representing biomedical knowledge in the UMLS semantic network , 1993 .

[13]  C Lovis,et al.  Analysis of medical texts based on a sound medical model. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[14]  Christoph Wick,et al.  Augmented Reality Simulator for Training in Two-Dimensional Echocardiography , 2000, Comput. Biomed. Res..

[15]  Olivier Bodenreider,et al.  Aggregating UMLS Semantic Types for Reducing Conceptual Complexity , 2001, MedInfo.

[16]  Olivier Bodenreider,et al.  Aspects of the taxonomic relation in the biomedical domain , 2001, FOIS.

[17]  Peter Spyns Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.

[18]  Charles N. Li,et al.  Subject and topic , 1979 .

[19]  Peter J. Haug,et al.  Research Paper: Automatic Detection of Acute Bacterial Pneumonia from Chest X-ray Reports , 2000, J. Am. Medical Informatics Assoc..

[20]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[21]  C Lovis,et al.  Alternative Ways for Knowledge Collection, Indexing and Robust Language Retrieval , 1998, Methods of Information in Medicine.

[22]  P Zweigenbaum,et al.  A multi-lingual architecture for building a normalised conceptual representation from medical language. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[23]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[24]  Thomas C. Rindflesch,et al.  Integrating Natural Language Processing and Biomedical Domain Knowledge Increased Information Retrieval Effectiveness , 1995 .

[25]  S B Johnson,et al.  Interpreting natural language queries using the UMLS. , 1993, Proceedings. Symposium on Computer Applications in Medical Care.

[26]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.

[27]  Martin Romacker,et al.  Streamlining semantic interpretation for medical narratives , 1999, AMIA.

[28]  Jean Charlet,et al.  Evaluating a normalized conceptual representation produced from natural language patient discharge summaries , 1997, AMIA.

[29]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[30]  G Hripcsak,et al.  Natural language processing and its future in medicine. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[31]  George Hripcsak,et al.  Mapping abbreviations to full forms in biomedical articles. , 2002, Journal of the American Medical Informatics Association : JAMIA.

[32]  Alan L. Rector,et al.  NLP techniques associated with the OpenGALEN ontology for semi-automatic textual extraction of medical knowledge: abstracting and mapping equivalent linguistic and logical constructs , 2000, AMIA.

[33]  Charles Sneiderman,et al.  Argument identification for arterial branching predications asserted in cardiac catheterization reports , 2000, AMIA.

[34]  Peter J. Haug,et al.  Development and evaluation of a computerized admission diagnoses encoding system. , 1996, Computers and biomedical research, an international journal.

[35]  L M Lau,et al.  A natural language understanding system combining syntactic and semantic techniques. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[36]  Martin Romacker,et al.  Discourse structures in medical reports - Watch out! The generation of referentially coherent and valid text knowledge bases in the medSYNDIKATE system , 1999, Int. J. Medical Informatics.

[37]  Hongfang Liu,et al.  A study of abbreviations in MEDLINE abstracts , 2002, AMIA.

[38]  W. Chafe Givenness, contrastiveness, definiteness, subjects, topics, and point of view , 1976 .

[39]  N L Jain,et al.  Respiratory Isolation of Tuberculosis Patients Using Clinical Guidelines and an Automated Clinical Decision Support System , 1998, Infection Control & Hospital Epidemiology.

[40]  H R Garner,et al.  Heuristics for Identification of Acronym-Definition Patterns within Text: Towards an Automated Construction of Comprehensive Acronym-Definition Dictionaries , 2002, Methods of Information in Medicine.

[41]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[42]  Martin Romacker,et al.  MedSynDiKATe-design considerations for an ontology-based medical text understanding system , 2000, AMIA.

[43]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[44]  Olivier Bodenreider,et al.  The NLM Indexing Initiative , 2000, AMIA.

[45]  James Geller,et al.  Partitioning the UMLS semantic network , 2002, IEEE Transactions on Information Technology in Biomedicine.

[46]  Li Zhang,et al.  Enriching the structure of the UMLS semantic network , 2002, AMIA.

[47]  Peter J. Haug,et al.  Using medical language processing to support real-time evaluation of pneumonia guidelines , 2000, AMIA.

[48]  Naomi C. Broering,et al.  High performance medical libraries: Advances in information management for the virtual era , 1993 .

[49]  Judith L. Klavans,et al.  Extracting taxonomic relationships from on-line definitional sources using LEXING , 2001, JCDL '01.

[50]  Carol Friedman,et al.  A broad-coverage natural language processing system , 2000, AMIA.

[51]  Martin Romacker,et al.  How knowledge drives understandingmatching medical ontologies with the needs of medical language processing , 1999, Artif. Intell. Medicine.

[52]  Olivier Bodenreider,et al.  Lexically-suggested hyponymic relations among medical terms and their representation in the UMLS , 2001 .

[53]  Padmini Srinivasan,et al.  Exploring text mining from MEDLINE , 2002, AMIA.

[54]  Martin Romacker,et al.  MedSynDikate - a natural language system for the extraction of medical information from findings reports , 2002, Int. J. Medical Informatics.

[55]  Ralph Grishman,et al.  Information extraction for enhanced access to disease outbreak reports , 2002, J. Biomed. Informatics.

[56]  James Geller,et al.  The cohesive metaschema: a higher-level abstraction of the UMLS Semantic Network , 2002, J. Biomed. Informatics.

[57]  A L Rector,et al.  The GALEN project. , 1994, Computer methods and programs in biomedicine.

[58]  Lawrence Hunter,et al.  Extracting Molecular Binding Relationships from Biomedical Text , 2000, ANLP.

[59]  R. Brian Haynes,et al.  Developing optimal search strategies for detecting clinically sound studies in MEDLINE. , 1994, Journal of the American Medical Informatics Association : JAMIA.

[60]  George Hripcsak,et al.  Research Paper: A Reliability Study for Evaluating Information Extraction from Radiology Reports , 1999, J. Am. Medical Informatics Assoc..

[61]  L. Ohno-Machado Journal of Biomedical Informatics , 2001 .

[62]  Hongfang Liu,et al.  Evaluating the UMLS as a source of lexical knowledge for medical language processing , 2001, AMIA.

[63]  Stephen B. Johnson,et al.  Analyzing the Semantics of patient data to rank records of literature retrieval , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[64]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[65]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.