论文信息 - Mining a Lexicon of Technical Terms and Lay Equivalents

Mining a Lexicon of Technical Terms and Lay Equivalents

We present a corpus-driven method for building a lexicon of semantically equivalent pairs of technical and lay medical terms. Using a parallel corpus of abstracts of clinical studies and corresponding news stories written for a lay audience, we identify terms which are good semantic equivalents of technical terms for a lay audience. Our method relies on measures of association. Results show that, despite the small size of our corpus, a promising number of pairs are identified.

Noémie Elhadad | Komal Sutaria | Noémie Elhadad | K. Sutaria

[1] L. A. Goodman,et al. Measures of association for cross classifications , 1979 .

[2] Kenneth Ward Church,et al. Identifying word correspondence in parallel texts , 1991 .

[3] Jacques Robin,et al. Revision-based generation of natural language summaries providing historical background: corpus-based analysis, design, implementation and evaluation , 1995 .

[4] R. Rudd,et al. Health and Literacy: A Review of Medical and Public Health Literature , 1999 .

[5] Kenneth Ward Church,et al. Identifying Word Correspondences in Parallel Texts , 1991, HLT.

[6] Kevin Knight,et al. Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[7] B. Everitt,et al. Statistical methods for rates and proportions , 1973 .

[8] Regina Barzilay,et al. Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[9] C. Lindberg. The Unified Medical Language System (UMLS) of the National Library of Medicine. , 1990, Journal.

[10] Tony McEnery,et al. Parallel and comparable corpora: What is happening? , 2007 .

[11] Diana J. Mason,et al. Promoting Health Literacy , 2001 .

[12] J. Fleiss,et al. Statistical methods for rates and proportions , 1973 .

[13] Qing Zeng-Treitler,et al. A Text Corpora-Based Estimation of the Familiarity of Health Terminology , 2005, ISBMDA.

[14] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[15] Noémie Elhadad. Comprehending Technical Texts: Predicting and Defining Unfamiliar Terms , 2006, AMIA.

[16] Simone Teufel,et al. Collection and linguistic processing of a large-scale corpus of medical articles , 2002, LREC.

[17] Evelyne Tzoukermann,et al. Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.