Paraphrase Detection in Monolingual Specialized/Lay Comparable Corpora

Paraphrases are a key feature in many natural language processing applications, and their extraction and generation are important tasks to tackle. Given two comparable corpora in the same language and the same domain, but displaying two different discourse types (lay and specialized), specific paraphrases can be spotted which provide a dimension along which these discourse types can be contrasted. Detecting such paraphrases in comparable corpora is the goal of the present work. Generally, paraphrases are identified by means of lexical and/or structural patterns. In this chapter, we present two methods to extract paraphrases across lay and specialized French monolingual comparable corpora. The first method uses lexical patterns designed according to intuition and linguistic studies, while the second is empirical, based on n-gram matching. The two methods appear to be complementary: the n-gram method confirms the initial lexical patterns and identifies other patterns. Besides, differences in the direction of application of paraphrase patterns highlight differences between specialized and lay discourse.

[1]  Pierre Zweigenbaum,et al.  ACL-IJCNLP 2009 BUCC 2009 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora , 2009 .

[2]  Nabil Hathout,et al.  An Experimental Constructional Database : The MorTAL Project , 2002 .

[3]  Tony McEnery,et al.  Chapter 2. Parallel and Comparable Corpora: What is Happening? , 2007 .

[4]  Jimmy J. Lin,et al.  Extracting Structural Paraphrases from Aligned Monolingual Corpora , 2003, IWP@ACL.

[5]  Emmanuel Morin,et al.  Identification, Alignment, and Tranlsation of Relational Adjectives from Comparable Corpora (Identification, alignement, et traductions des adjectifs relationnels en corpus comparables) [in French] , 2013, JEP/TALN/RECITAL.

[6]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[7]  Pascale Pung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL 1995.

[8]  Satoshi Sekine,et al.  Paraphrase Acquisition for Information Extraction , 2003, IWP@ACL.

[9]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[10]  Guillaume Wisniewski,et al.  Mining Naturally-occurring Corrections and Paraphrases from Wikipedia’s Revision History , 2022, LREC.

[11]  Z. Fang Scientific literacy: A systemic functional linguistics perspective , 2005 .

[12]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[13]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[14]  Bernard Fradin,et al.  On the Semantics of Denominal Adjectives , 2008 .

[15]  Pascale Fung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL.

[16]  Kyo Kageura,et al.  Bilingual Terminology Mining - Using Brain, not brawn comparable corpora , 2007, ACL.

[17]  Carina Silberer,et al.  Proceedings of the International Conference on Language Resources and Evaluation (LREC) , 2008 .

[18]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[19]  Christian Jacquemin,et al.  Syntagmatic and Paradigmatic Representations of Term Variation , 1999, ACL.

[20]  Satanjeev Banerjee,et al.  The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.

[21]  Aurélien Max Local Rephrasing Suggestions for Supporing the Work of Writers , 2008, GoTAL.

[22]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[23]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[24]  Marius Pasca,et al.  Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web , 2005, IJCNLP.

[25]  Noémie Elhadad,et al.  Mining a Lexicon of Technical Terms and Lay Equivalents , 2007, BioNLP@ACL.

[26]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.