Mining a Lexicon of Technical Terms and Lay Equivalents

We present a corpus-driven method for building a lexicon of semantically equivalent pairs of technical and lay medical terms. Using a parallel corpus of abstracts of clinical studies and corresponding news stories written for a lay audience, we identify terms which are good semantic equivalents of technical terms for a lay audience. Our method relies on measures of association. Results show that, despite the small size of our corpus, a promising number of pairs are identified.

[1]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[2]  Kenneth Ward Church,et al.  Identifying word correspondence in parallel texts , 1991 .

[3]  Jacques Robin,et al.  Revision-based generation of natural language summaries providing historical background: corpus-based analysis, design, implementation and evaluation , 1995 .

[4]  R. Rudd,et al.  Health and Literacy: A Review of Medical and Public Health Literature , 1999 .

[5]  Kenneth Ward Church,et al.  Identifying Word Correspondences in Parallel Texts , 1991, HLT.

[6]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[7]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[8]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[9]  C. Lindberg The Unified Medical Language System (UMLS) of the National Library of Medicine. , 1990, Journal.

[10]  Tony McEnery,et al.  Parallel and comparable corpora: What is happening? , 2007 .

[11]  Diana J. Mason,et al.  Promoting Health Literacy , 2001 .

[12]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[13]  Qing Zeng-Treitler,et al.  A Text Corpora-Based Estimation of the Familiarity of Health Terminology , 2005, ISBMDA.

[14]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[15]  Noémie Elhadad Comprehending Technical Texts: Predicting and Defining Unfamiliar Terms , 2006, AMIA.

[16]  Simone Teufel,et al.  Collection and linguistic processing of a large-scale corpus of medical articles , 2002, LREC.

[17]  Evelyne Tzoukermann,et al.  Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.