Using Dependency-Based Features to Take the ’Para-farce’ out of Paraphrase

As research in text-to-text paraphrase generation progresses, it has the potential to improve the quality of generated text. However, the use of paraphrase generation methods creates a secondary problem. We must ensure that generated novel sentences are not inconsistent with the text from which it was generated. We propose a machine learning approach be used to filter out inconsistent novel sentences, or False Paraphrases. To train such a filter, we use the Microsoft Research Paraphrase corpus and investigate whether features based on syntactic dependencies can aid us in this task. Like Finch et al. (2005), we obtain a classification accuracy of 75.6%, the best known performance for this corpus. We also examine the strengths and weaknesses of dependency based features and conclude that they may be useful in more accurately classifying cases of False Paraphrase.

[1]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[2]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[3]  Mark Dras,et al.  Tree adjoining grammar and the reluctant paraphrasing of text , 1999 .

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Diego Mollá Aliod Towards semantic-based overlap measures for question-answering , 2003, ALTA.

[7]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[8]  Dan Roth,et al.  Mapping Dependencies Trees: An Application to Question Answering , 2003 .

[9]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[10]  Eduard Hovy,et al.  Evaluating DUC 2005 using Basic Elements , 2005 .

[11]  Eiichiro Sumita,et al.  Using Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence , 2005, IJCNLP.

[12]  Yves Lepage,et al.  Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation , 2005, IJCNLP.

[13]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[14]  B. Magnini,et al.  Recognizing Textual Entailment with Tree Edit Distance Algorithms , 2005 .

[15]  Jon Patrick,et al.  Paraphrase Identification by Text Canonicalization , 2005, ALTA.

[16]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[17]  Emiel Krahmer,et al.  Explorations in Sentence Fusion , 2005, ENLG.

[18]  Bernardo Magnini,et al.  Combining Lexical Resources with Tree Edit Distance for Recognizing Textual Entailment , 2005, MLCW.

[19]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[20]  Stephen Wan,et al.  Towards Statistical Paraphrase Generation: Preliminary Evaluations of Grammaticality , 2005, IWP@IJCNLP.

[21]  Tat-Seng Chua,et al.  Paraphrase Recognition via Dissimilarity Significance Classification , 2006, EMNLP.