Unsupervised Learning of Word-Sequence Representations from Scratch via Convolutional Tensor Decomposition

Unsupervised text embeddings extraction is crucial for text understanding in machine learning. Word2Vec and its variants have received substantial success in mapping words with similar syntactic or semantic meaning to vectors close to each other. However, extracting context-aware word-sequence embedding remains a challenging task. Training over large corpus is difficult as labels are difficult to get. More importantly, it is challenging for pre-trained models to obtain word-sequence embeddings that are universally good for all downstream tasks or for any new datasets. We propose a two-phased ConvDic+DeconvDec framework to solve the problem by combining a word-sequence dictionary learning model with a word-sequence embedding decode model. We propose a convolutional tensor decomposition mechanism to learn good word-sequence phrase dictionary in the learning phase. It is proved to be more accurate and much more efficient than the popular alternating minimization method. In the decode phase, we introduce a deconvolution framework that is immune to the problem of varying sentence lengths. The word-sequence embeddings we extracted using ConvDic+DeconvDec are universally good for a few downstream tasks we test on. The framework requires neither pre-training nor prior/outside information.

[1]  Maria Gabriela Eberle,et al.  Finding the closest Toeplitz matrix , 2003 .

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[4]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[5]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[6]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[7]  Arthur C. Graesser,et al.  Paraphrase Identification with Lexico-Syntactic Graph Subsumption , 2008, FLAIRS.

[8]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[9]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[10]  Rada Mihalcea,et al.  Measuring semantic relatedness using salient encyclopedic concepts , 2011 .

[11]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[12]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Anima Anandkumar,et al.  Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[18]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[19]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[20]  Simon Lucey,et al.  Optimization Methods for Convolutional Sparse Coding , 2014, ArXiv.

[21]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[22]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[25]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[26]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[27]  Sham M. Kakade,et al.  A Linear Dynamical System Model for Text , 2015, ICML.

[28]  Han Zhao,et al.  Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[29]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[30]  Mark Dredze,et al.  Learning Composition Models for Phrase Embeddings , 2015, TACL.

[31]  Anima Anandkumar,et al.  Convolutional Dictionary Learning through Tensor Factorization , 2015, FE@NIPS.

[32]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[33]  M. de Rijke,et al.  Siamese CBOW: Optimizing Word Embeddings for Sentence Representations , 2016, ACL.

[34]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[35]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.