Text Compression-Aided Transformer Encoding

Text encoding is one of the most important steps in Natural Language Processing (NLP). It has been done well by the self-attention mechanism in the current state-of-the-art Transformer encoder, which has brought about significant improvements in the performance of many NLP tasks. Though the Transformer encoder may effectively capture general information in its resulting representations, the backbone information, meaning the gist of the input text, is not specifically focused on. In this paper, we propose explicit and implicit text compression approaches to enhance the Transformer encoding and evaluate models using this approach on several typical downstream tasks that rely on the encoding heavily. Our explicit text compression approaches use dedicated models to compress text, while our implicit text compression approach simply adds an additional module to the main model to handle text compression. We propose three ways of integration, namely backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the backbone information into Transformer-based models for various downstream tasks. Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines. We therefore conclude, when comparing the encodings to the baseline models, text compression helps the encoders to learn better language representations.

[1]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[2]  Wanxiang Che,et al.  Sentence Compression for Aspect-Based Sentiment Analysis , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Adrià de Gispert,et al.  Source sentence simplification for statistical machine translation , 2017, Comput. Speech Lang..

[4]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[5]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[6]  Yoshua Bengio,et al.  Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation , 2014, SSST@EMNLP.

[7]  Jan Niehues,et al.  The IWSLT 2015 Evaluation Campaign , 2015, IWSLT.

[8]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[9]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[10]  Shuming Shi,et al.  Exploiting Deep Representations for Neural Machine Translation , 2018, EMNLP.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Benjamin Van Durme,et al.  Annotated Gigaword , 2012, AKBC-WEKEX@NAACL-HLT.

[13]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[14]  Ankur Taly,et al.  Did the Model Understand the Question? , 2018, ACL.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[17]  Jason Phang,et al.  Unsupervised Sentence Compression using Denoising Auto-Encoders , 2018, CoNLL.

[18]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[21]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[22]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[23]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[24]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[25]  Furu Wei,et al.  Read + Verify: Machine Reading Comprehension with Unanswerable Questions , 2018, AAAI.

[26]  Jingbo Zhu,et al.  Syntactic Skeleton-Based Translation , 2016, AAAI.

[27]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[28]  Tong Zhang,et al.  Modeling Localness for Self-Attention Networks , 2018, EMNLP.

[29]  Yao Zhao,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[30]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[31]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[32]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[33]  Hung-yi Lee,et al.  Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks , 2018, EMNLP.

[34]  Kevin Duh,et al.  Divide and Translate: Improving Long Distance Reordering in Statistical Machine Translation , 2010, WMT@ACL.

[35]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[36]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[37]  Peng Li,et al.  Option Comparison Network for Multiple-choice Reading Comprehension , 2019, ArXiv.

[38]  Yang Liu,et al.  U-Net: Machine Reading Comprehension with Unanswerable Questions , 2018, ArXiv.

[39]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[40]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[41]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[42]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[43]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[44]  Hai Zhao,et al.  Dual Multi-head Co-attention for Multi-choice Reading Comprehension , 2020, ArXiv.

[45]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[46]  Andy Way,et al.  A Syntactic Skeleton for Statistical Machine Translation , 2006, EAMT.

[47]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[48]  Xing Wang,et al.  Context-Aware Self-Attention Networks , 2019, AAAI.

[49]  Qingcai Chen,et al.  LCSTS: A Large Scale Chinese Short Text Summarization Dataset , 2015, EMNLP.

[50]  Xing Wang,et al.  Exploiting Sentential Context for Neural Machine Translation , 2019, ACL.

[51]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[52]  Ming Zhou,et al.  ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, FINDINGS.

[53]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[54]  Angela Fan,et al.  Controllable Abstractive Summarization , 2017, NMT@ACL.

[55]  Graham Neubig,et al.  Controlling Output Length in Neural Encoder-Decoders , 2016, EMNLP.

[56]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[57]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[58]  Hai Zhao,et al.  Retrospective Reader for Machine Reading Comprehension , 2020, AAAI.

[59]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[60]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[61]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[62]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[63]  Min Zhang,et al.  Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention , 2019, ACL.