Paraphrase Acquisition from Comparable Medical Corpora of Specialized and Lay Texts

Nowadays a large amount of health information is available to the public, but medical language is often difficult for lay people to understand. Developing means to make medical information more comprehensible is therefore a real need. In this regard, a useful resource would be a corpus of specialized and lay paraphrases. To this end we built comparable corpora of specialized and lay texts on which we applied paraphrasing patterns based on anchors of deverbal noun and verb pairs. The results show that the paraphrases were of good quality (71.4% to 94.2% precision) and that this type of paraphrases was relevant in the context of studying the differences between specialized and lay language. This study also demonstrates that simple paraphrase acquisition methods can also work on texts with a rather small degree of similarity, once similar text segments are detected.

[1]  Natalia Grabar,et al.  Classification of Health Webpages as Expert and Non Expert with a Reduced Set of Cross-language Features , 2007, AMIA.

[2]  Gondy Leroy,et al.  Health Information Text Characteristics , 2006, AMIA.

[3]  Z. Fang Scientific literacy: A systemic functional linguistics perspective , 2005 .

[4]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[5]  Alexa T. McCray,et al.  Strategies for Supporting Consumer Health Information Seeking , 2004, MedInfo.

[6]  Rita D. Zielstorff,et al.  Controlled vocabularies for consumer health , 2003, J. Biomed. Informatics.

[7]  Noémie Elhadad,et al.  Mining a Lexicon of Technical Terms and Lay Equivalents , 2007, BioNLP@ACL.

[8]  Pierre Zweigenbaum,et al.  Aligning Lay and Specialized Passages in Comparable Medical Corpora , 2008, MIE.

[9]  杨琤 Promoting Health Literacy for Health Benefits , 2009 .

[10]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[11]  Noémie Elhadad Comprehending Technical Texts: Predicting and Defining Unfamiliar Terms , 2006, AMIA.

[12]  Kentaro Inui,et al.  A Class-oriented Approach to Building a Paraphrase Corpus , 2005, IWP@IJCNLP.

[13]  E. Lerner,et al.  Medical communication: do our patients understand? , 2000, The American journal of emergency medicine.

[14]  Nabil Hathout,et al.  An Experimental Constructional Database : The MorTAL Project , 2002 .

[15]  Satoshi Sekine,et al.  Paraphrase Acquisition for Information Extraction , 2003, IWP@ACL.

[16]  Alla Keselman,et al.  Making Texts in Electronic Health Records Comprehensible to Consumers: A Prototype Translator , 2007, AMIA.

[17]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[18]  Q. Zeng,et al.  Exploring and Developing Consumer Health Vocabularies , 2005 .