Semantically Driven Sentence Fusion: Modeling and Evaluation

Sentence fusion is the task of joining related sentences into coherent text. Current training and evaluation schemes for this task are based on single reference ground-truths and do not account for valid fusion variants. We show that this hinders models from robustly capturing the semantic relationship between input sentences. To alleviate this, we present an approach in which ground-truth solutions are automatically expanded into multiple references via curated equivalence classes of connective phrases. We apply this method to a large-scale dataset and use the augmented dataset for both model training and evaluation. To improve the learning of semantic representation using multiple references, we enrich the model with auxiliary discourse classification tasks under a multi-tasking framework. Our experiments highlight the improvements of our approach over state-of-the-art models.

[1]  Graham Neubig,et al.  Beyond BLEU:Training Neural Machine Translation with Semantic Similarity , 2019, ACL.

[2]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[3]  Franck Dernoncourt,et al.  Analyzing Sentence Fusion in Abstractive Summarization , 2019, EMNLP.

[4]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[5]  Chi-kiu Lo,et al.  YiSi - a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources , 2019, WMT.

[6]  Chi-kiu Lo,et al.  MEANT 2.0: Accurate semantic MT evaluation for any output language , 2017, WMT.

[7]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[8]  Pascal Denis,et al.  Learning Connective-based Word Representations for Implicit Discourse Relation Identification , 2016, EMNLP.

[9]  Mamoru Komachi,et al.  Filtering Pseudo-References by Paraphrasing for Automatic Evaluation of Machine Translation , 2019, WMT.

[10]  Chris Callison-Burch,et al.  Optimizing Statistical Machine Translation for Text Simplification , 2016, TACL.

[11]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[12]  Hai Zhao,et al.  Adversarial Connective-exploiting Networks for Implicit Discourse Relation Classification , 2017, ACL.

[13]  Avinatan Hassidim,et al.  Dynamic Composition for Conversational Domain Exploration , 2020, WWW.

[14]  Wei Zhao,et al.  SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization , 2020, ACL.

[15]  Regina Barzilay,et al.  Paraphrasing for Automatic Evaluation , 2006, NAACL.

[16]  Ani Nenkova,et al.  Easily Identifiable Discourse Relations , 2008, COLING.

[17]  Daniele Pighin,et al.  Automatic Prediction of Discourse Connectives , 2018, LREC.

[18]  Micha Elsner,et al.  Learning to Fuse Disparate Sentences , 2011, Monolingual@ACL.

[19]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[20]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Noah D. Goodman,et al.  DisSent: Learning Sentence Representations from Explicit Discourse Relations , 2019, ACL.

[25]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[26]  Fei Liu,et al.  MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance , 2019, EMNLP.

[27]  Thibault Sellam,et al.  BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.

[28]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[29]  Samuel R. Bowman,et al.  Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning , 2017, ArXiv.

[30]  Emiel Krahmer,et al.  Explorations in Sentence Fusion , 2005, ENLG.

[31]  Noah A. Smith,et al.  Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts , 2019, ACL.

[32]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[33]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  Jian Su,et al.  The Effects of Discourse Connectives Prediction on Implicit Discourse Relation Recognition , 2010, SIGDIAL Conference.

[36]  Manfred Stede,et al.  Discourse Marker Choice in Sentence Planning , 1998, INLG.

[37]  Kathleen McKeown,et al.  Supervised Sentence Fusion with Single-Stage Inference , 2013, IJCNLP.

[38]  Rashmi Prasad,et al.  The Penn Discourse Treebank , 2004, LREC.

[39]  Yang Liu,et al.  Implicit Discourse Relation Classification via Multi-Task Neural Networks , 2016, AAAI.

[40]  Rotem Dror,et al.  The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing , 2018, ACL.

[41]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[42]  Michael Strube,et al.  Sentence Fusion via Dependency Graph Compression , 2008, EMNLP.

[43]  Idan Szpektor,et al.  DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion , 2019, NAACL.

[44]  Aliaksei Severyn,et al.  Encode, Tag, Realize: High-Precision Text Editing , 2019, EMNLP.

[45]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[46]  Michael Elhadad,et al.  Generating Connectives , 1990, COLING.

[47]  Rebecca Hwa,et al.  The Role of Pseudo References in MT Evaluation , 2008, WMT@ACL.