Convolution-deconvolution word embedding: An end-to-end multi-prototype fusion embedding method for natural language processing

Abstract Existing unsupervised word embedding methods have been proved to be effective to capture latent semantic information on various tasks of Natural Language Processing (NLP). However, existing word representation methods are incapable of tackling both the polysemous-unaware and task-unaware problems that are common phenomena in NLP tasks. In this work, we present a novel Convolution–Deconvolution Word Embedding (CDWE), an end-to-end multi-prototype fusion embedding that fuses context-specific information and task-specific information. To the best of our knowledge, we are the first to extend deconvolution (e.g. convolution transpose), which has been widely used in computer vision, to word embedding generation. We empirically demonstrate the efficiency and generalization ability of CDWE by applying it to two representative tasks in NLP: text classification and machine translation. The models of CDWE significantly outperform the baselines and achieve state-of-the-art results on both tasks. To validate the efficiency of CDWE further, we demonstrate how CDWE solves the polysemous-unaware and task-unaware problems via analyzing the Text Deconvolution Saliency, which is an existing strategy for evaluating the outputs of deconvolution.

[1]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[2]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[3]  Qun Liu,et al.  Deep Neural Machine Translation with Linear Associative Unit , 2017, ACL.

[4]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[5]  Min Zhang,et al.  Variational Neural Machine Translation , 2016, EMNLP.

[6]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[7]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8]  Rongrong Ji,et al.  Asynchronous Bidirectional Decoding for Neural Machine Translation , 2018, AAAI.

[9]  Rui Zhang,et al.  Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents , 2016, NAACL.

[10]  Lemao Liu,et al.  Agreement on Target-bidirectional Neural Machine Translation , 2016, NAACL.

[11]  Egoitz Laparra,et al.  From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations , 2018, TACL.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Ye Zhang,et al.  MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification , 2016, NAACL.

[14]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[15]  Qinghua Zheng,et al.  Semi-supervised clue fusion for spammer detection in Sina Weibo , 2018, Inf. Fusion.

[16]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[19]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[20]  Zhiyuan Liu,et al.  Joint Learning of Character and Word Embeddings , 2015, IJCAI.

[21]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[22]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[23]  Yi Chen,et al.  Learning Context-Specific Word/Character Embeddings , 2017, AAAI.

[24]  María-Dolores Olvera-Lobo,et al.  Question Answering Track Evaluation in TREC, CLEF and NTCIR , 2015, WorldCIST.

[25]  Frédéric Precioso,et al.  Textual Deconvolution Saliency (TDS) : a deep tool box for linguistic analysis , 2018, ACL.

[26]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[27]  Erik Cambria,et al.  Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM , 2018, AAAI.

[28]  Kathleen M. Carley,et al.  Parameterized Convolutional Neural Networks for Aspect Level Sentiment Classification , 2019, EMNLP.

[29]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[30]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[31]  Jin Wang,et al.  Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification , 2017, IJCAI.

[32]  Shan Wu,et al.  Variational Recurrent Neural Machine Translation , 2018, AAAI.

[33]  Xin Yuan,et al.  A Deep Generative Deconvolutional Image Model , 2015, AISTATS.

[34]  Xiaoyong Du,et al.  Initializing Convolutional Filters with Semantic Features for Text Classification , 2017, EMNLP.

[35]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[36]  Francisco Herrera,et al.  Distinguishing between facts and opinions for sentiment analysis: Survey and challenges , 2018, Inf. Fusion.

[37]  Shuai Wang,et al.  Target-Sensitive Memory Networks for Aspect Sentiment Classification , 2018, ACL.

[38]  Qun Liu,et al.  Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation , 2017, ACL.

[39]  James H. Martin,et al.  Abstract Meaning Representation Parsing using LSTM Recurrent Neural Networks , 2017, ACL.

[40]  Qun Liu,et al.  Memory-enhanced Decoder for Neural Machine Translation , 2016, EMNLP.

[41]  Huanhuan Chen,et al.  Latent Topic Text Representation Learning on Statistical Manifolds , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[43]  Zhen Wang,et al.  Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[44]  David M. Blei,et al.  Context Selection for Embedding Models , 2017, NIPS.

[45]  Xin Li,et al.  Aspect Term Extraction with History Attention and Selective Transformation , 2018, IJCAI.

[46]  Hwee Tou Ng,et al.  Effective Attention Modeling for Aspect-Level Sentiment Classification , 2018, COLING.

[47]  Qiang Yang,et al.  Query enrichment for web-query classification , 2006, TOIS.

[48]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[49]  Fabio A. González,et al.  word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis , 2019, IEEE Computational Intelligence Magazine.

[50]  Jonathan Loo,et al.  A sentiment information Collector-Extractor architecture based neural network for sentiment analysis , 2018, Inf. Sci..

[51]  Yong Luo,et al.  Knowledge-Enhanced Ensemble Learning for Word Embeddings , 2019, WWW.

[52]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[53]  Shiliang Sun,et al.  A review of natural language processing techniques for opinion mining systems , 2017, Inf. Fusion.

[54]  Guoyin Wang,et al.  Deconvolutional Paragraph Representation Learning , 2017, NIPS.

[55]  Fei Liu,et al.  Abstract Meaning Representation for Multi-Document Summarization , 2018, COLING.

[56]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[57]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[58]  Tao Chen,et al.  Learning User and Product Distributed Representations Using a Sequence Model for Sentiment Analysis , 2016, IEEE Computational Intelligence Magazine.

[59]  SangKeun Lee,et al.  From Small-scale to Large-scale Text Classification , 2019, WWW.