Biology Based Alignments of Paraphrases for Sentence Compression

In this paper, we present a study for extracting and aligning paraphrases in the context of Sentence Compression. First, we justify the application of a new measure for the automatic extraction of paraphrase corpora. Second, we discuss the work done by (Barzilay & Lee, 2003) who use clustering of paraphrases to induce rewriting rules. We will see, through classical visualization methodologies (Kruskal & Wish, 1977) and exhaustive experiments, that clustering may not be the best approach for automatic pattern identification. Finally, we will provide some results of different biology based methodologies for pairwise paraphrase alignment.

[1]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .

[2]  Regina Barzilay,et al.  Bootstrapping Lexical Choice via Multiple-Sequence Alignment , 2002, EMNLP.

[3]  Akira Shimazu,et al.  Example-based sentence reduction using the hidden markov model , 2004, TALIP.

[4]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[5]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[6]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[7]  João Cordeiro,et al.  Learning Paraphrases from WNS Corpora , 2007, FLAIRS Conference.

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[10]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[11]  Emiel Krahmer,et al.  Explorations in Sentence Fusion , 2005, ENLG.

[12]  Walter Daelemans,et al.  Automatic Sentence Simplification for Subtitling in Dutch and English , 2004, LREC.

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[15]  Jun'ichi Tsujii,et al.  Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approaches , 2006, ACL.