Domain-Specific Paraphrase Extraction

The validity of applying paraphrase rules depends on the domain of the text that they are being applied to. We develop a novel method for extracting domainspecific paraphrases. We adapt the bilingual pivoting paraphrase method to bias the training data to be more like our target domain of biology. Our best model results in higher precision while retaining complete recall, giving a 10% relative improvement in AUC.

[1]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[2]  Eiichiro Sumita,et al.  Method of Selecting Training Data to Build a Compact and Efficient Translation Model , 2008, IJCNLP.

[3]  Ani Nenkova,et al.  Inducing Lexical Style Properties for Paraphrase and Genre Differentiation , 2015, NAACL.

[4]  Marianna Apidianaki,et al.  Semantic Clustering of Pivot Paraphrases , 2014, LREC.

[5]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[6]  Spyridon Matsoukas,et al.  Discriminative Corpus Weight Estimation for Machine Translation , 2009, EMNLP.

[7]  Ralph Grishman,et al.  Paraphrasing for Style , 2012, COLING.

[8]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[9]  Jianfeng Gao,et al.  Toward a unified approach to statistical language modeling for Chinese , 2002, TALIP.

[10]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[11]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[12]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[13]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[14]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[15]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[16]  Nitin Madnani,et al.  Using Paraphrases for Parameter Tuning in Statistical Machine Translation , 2007, WMT@ACL.

[17]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[18]  Eduard H. Hovy,et al.  Squibs: What Is a Paraphrase? , 2013, CL.

[19]  Nitin Madnani,et al.  Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods , 2010, CL.

[20]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[21]  Ion Androutsopoulos,et al.  A Survey of Paraphrasing and Textual Entailment Methods , 2009, J. Artif. Intell. Res..