Speeding up Context-based Sentence Representation Learning with Non-autoregressive Convolutional Decoding

We propose an asymmetric encoder-decoder structure, which keeps an RNN as the encoder and has a CNN as the decoder, and the model only explores the subsequent context information as the supervision. The asymmetry in both model architecture and training pair reduces a large amount of the training time. The contribution of our work is summarized as 1. We design experiments to show that an autoregressive decoder or an RNN decoder is not necessary for the encoder-decoder type of models in terms of learning sentence representations, and based on our results, we present 2 findings. 2. The two interesting findings lead to our final model design, which has an RNN encoder and a CNN decoder, and it learns to encode the current sentence and decode the subsequent contiguous words all at once. 3. With a suite of techniques, our model performs good on downstream tasks and can be trained efficiently on a large unlabelled corpus.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Samuel R. Bowman,et al.  Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning , 2017, ArXiv.

[3]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[4]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[5]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[6]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[7]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[8]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[9]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[10]  Hailin Jin,et al.  Rethinking Skip-thought: A Neighborhood based Approach , 2017, Rep4NLP@ACL.

[11]  Steve Renals,et al.  Multiplicative LSTM for sequence modelling , 2016, ICLR.

[12]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[13]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[14]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[15]  Eduard H. Hovy,et al.  A Model of Coherence Based on Distributed Sentence Representation , 2014, EMNLP.

[16]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[17]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[18]  Han Zhao,et al.  Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[19]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[20]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[23]  Noah D. Goodman,et al.  DisSent: Sentence Representation Learning from Explicit Discourse Relations , 2017, ArXiv.

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[27]  Zhe Gan,et al.  Learning Generic Sentence Representations Using Convolutional Neural Networks , 2016, EMNLP.

[28]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[29]  Pramod Viswanath,et al.  All-but-the-Top: Simple and Effective Postprocessing for Word Representations , 2017, ICLR.

[30]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[31]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[32]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[33]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[34]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[35]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[36]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[37]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[38]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[39]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[40]  Zhen-Hua Ling,et al.  Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference , 2016, ArXiv.

[41]  Jian Zhang,et al.  Natural Language Inference over Interaction Space , 2017, ICLR.