Determining the Syntactic Structure of Medical Terms in Clinical Notes

This paper demonstrates a method for determining the syntactic structure of medical terms. We use a model-fitting method based on the Log Likelihood Ratio to classify three-word medical terms as right or left-branching. We validate this method by computing the agreement between the classification produced by the method and manually annotated classifications. The results show an agreement of 75%--83%. This method may be used effectively to enable a wide range of applications that depend on the semantic interpretation of medical terms including automatic mapping of terms to standardized vocabularies and induction of terminologies from unstructured medical text.

[1]  Frank Keller,et al.  The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks , 2004, NAACL.

[2]  ChengXiang Zhai,et al.  Noun-Phrase Analysis in Unrestricted Text for Information Retrieval , 1996, ACL.

[3]  Philip Resnik,et al.  Structural Ambiguity and Conceptual Relations , 1993, VLC@ACL.

[4]  M. Liberman,et al.  The Stress and Structure of Modified Noun Phrases in English , 1992 .

[5]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[6]  Ted Pedersen,et al.  Significant Lexical Relationships , 1996, AAAI/IAAI, Vol. 1.

[7]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[8]  Bridget T. McInnes,et al.  Extending the Log Likelihood Measure to Improve Collocation Identification , 2004 .

[9]  Mark Lauer,et al.  Corpus Statistics Meet the Noun Compound: Some Empirical Results , 1995, ACL.

[10]  Kent A. Spackman,et al.  SNOMED clinical terms: overview of the development process and project status , 2001, AMIA.

[11]  Frank Keller,et al.  Determinants of Adjective-Noun Plausibility , 1999, EACL.

[12]  Ted Pedersen,et al.  Sequential Model Selection for Word Sense Disambiguation , 1997, ANLP.

[13]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[14]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[15]  Mark Dras,et al.  A Probabilistic Model of Compound Nouns , 1994, ArXiv.

[16]  Preslav Nakov,et al.  Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing , 2005, CoNLL.

[17]  Satanjeev Banerjee,et al.  The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.

[18]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[19]  Robert C. Moore On Log-Likelihood-Ratios and the Significance of Rare Events , 2004, EMNLP.

[20]  James Pustejovsky,et al.  Lexical Semantic Techniques for Corpus Analysis , 1993, CL.

[21]  Christopher G. Chute,et al.  A Corpus Driven Approach Applying the "Frame Semantic" Method for Modeling Functional Status Terminology , 2004, MedInfo.

[22]  Richard F. E. Sutcliffe,et al.  Disambiguating Noun Compounds with Latent Semantic Indexing , 2002, COLING 2002.