Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences

We describe a syntax-based algorithm that automatically builds Finite State Automata (word lattices) from semantically equivalent translation sets. These FSAs are good representations of paraphrases. They can be used to extract lexical and syntactic paraphrase pairs and to generate new, unseen sentences that express the same meaning as the sentences in the input sets. Our FSAs can also predict the correctness of alternative semantic renderings, which may be used to evaluate the quality of translations.

[1]  Karen Spärck Jones,et al.  Automatic Search Term variant Generation , 1984, J. Documentation.

[2]  Alain Polguère,et al.  Lexical Selection and Paraphrase in a Meaning-Text Generation Model , 1991 .

[3]  Nils Lenke,et al.  Anticipating the Reader’s Problems and the Automatic Generation of Paraphrases , 1994, COLING.

[4]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[5]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[6]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[7]  Barbara Di Eugenio Book Reviews: Lexical Semantics and Knowledge Representation in Multilingual Text Generation , 2000, Computational Linguistics.

[8]  Giuseppe Riccardi,et al.  Computing consensus translation from multiple machine translation systems , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[9]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[10]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[11]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[12]  Regina Barzilay,et al.  Bootstrapping Lexical Choice via Multiple-Sequence Alignment , 2002, EMNLP.

[13]  Florence Reeder,et al.  Corpus-based comprehensive and diagnostic MT evaluation: initial Arabic, Chinese, French, and Spanish results , 2002 .

[14]  Daniel Marcu,et al.  Natural Language Based Reformulation Resource and Wide Exploitation for Question Answering , 2002, TREC.

[15]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .

[16]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.