论文信息 - Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms - 字舞流文

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring a substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging. The source code and datasets can be obtained from https:// this http URL.

Guoyin Wang | Lawrence Carin | Martin Renqiang Min | Dinghan Shen | Chunyuan Li | Ricardo Henao | Yizhe Zhang | Wenlin Wang | Qinliang Su | Dinghan Shen | Ricardo Henao | L. Carin | Guoyin Wang | Chunyuan Li | Qinliang Su | Wenlin Wang | Martin Renqiang Min | Yizhe Zhang

[1] Sanjeev Arora,et al. A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[2] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4] Guoyin Wang,et al. Deconvolutional Paragraph Representation Learning , 2017, NIPS.

[5] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[6] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[7] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[8] Shiliang Zhang,et al. The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models , 2015, ACL.

[9] Mirella Lapata,et al. Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[10] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[11] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[13] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[14] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15] Han Zhao,et al. Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[16] Bo Pang,et al. Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[17] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[18] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[20] Hang Li,et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[21] Jeffrey Pennington,et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[22] Zhi Chen,et al. Adversarial Feature Matching for Text Generation , 2017, ICML.

[23] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.

[24] Hal Daumé,et al. Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[25] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26] Yann LeCun,et al. Very Deep Convolutional Networks for Natural Language Processing , 2016, ArXiv.

[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[28] Lawrence Carin,et al. Deconvolutional Latent-Variable Model for Text Sequence Matching , 2017, AAAI.

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[31] Andrew Y. Ng,et al. Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[32] Kevin Gimpel,et al. Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[33] Navdeep Jaitly,et al. Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[34] Zhe Gan,et al. Learning Generic Sentence Representations Using Convolutional Neural Networks , 2016, EMNLP.

[35] Mohit Bansal,et al. Shortcut-Stacked Sentence Encoders for Multi-Domain Inference , 2017, RepEval@EMNLP.

[36] Lawrence Carin,et al. Adaptive Convolutional Filter Generation for Natural Language Understanding , 2017, ArXiv.

[37] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[38] Yonatan Belinkov,et al. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.