Improving multi-stream classification by mapping sequence-embedding in a high dimensional space

Most of the Natural and Spoken Language Processing tasks now employ Neural Networks (NN), allowing them to reach impressive performances. Embedding features allow the NLP systems to represent input vectors in a latent space and to improve the observed performances. In this context, Recurrent Neural Network (RNN) based architectures such as Long Short-Term Memory (LSTM) are well known for their capacity to encode sequential data into a non-sequential hidden vector representation, called sequence embedding. In this paper, we propose an LSTM-based multi-stream sequence embedding in order to encode parallel sequences by a single non-sequential latent representation vector. We then propose to map this embedding representation in a high-dimensional space using a Support Vector Machine (SVM) in order to classify the multi-stream sequences by finding out an optimal hyperplane. Multi-stream sequence embedding allowed the SVM classifier to more efficiently profit from information carried by both parallel streams and longer sequences. The system achieved the best performance, in a multi-stream sequence classification task, with a gain of 9 points in error rate compared to an SVM trained on the original input sequences.

[1]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[2]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[3]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[4]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[5]  Jürgen Schmidhuber,et al.  Applying LSTM to Time Series Predictable through Time-Window Approaches , 2000, ICANN.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Rohini S. Rahate,et al.  Feature Selection for Sentiment Analysis by using SVM , 2013 .

[8]  Wenpeng Yin,et al.  Discriminative Phrase Embedding for Paraphrase Identification , 2016, NAACL.

[9]  Sankar K. Pal,et al.  Multilayer perceptron, fuzzy sets, and classification , 1992, IEEE Trans. Neural Networks.

[10]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[11]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[12]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[13]  Ming Zhou,et al.  Bilingually-constrained Phrase Embeddings for Machine Translation , 2014, ACL.

[14]  Chia-Hua Ho,et al.  Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[17]  Xueqi Cheng,et al.  TASC:Topic-Adaptive Sentiment Classification on Dynamic Tweets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[18]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[19]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Yang Liu,et al.  Recursive Autoencoders for ITG-Based Translation , 2013, EMNLP.

[23]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[24]  Minlie Huang,et al.  Modeling Rich Contexts for Sentiment Classification with LSTM , 2016, ArXiv.

[25]  Xin Wang,et al.  Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory , 2015, ACL.

[26]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[27]  Yanjun Qi,et al.  Sentiment Classification with Supervised Sequence Embedding , 2012, ECML/PKDD.

[28]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[31]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[32]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[33]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[34]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[35]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[36]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.