Investigating Siamese LSTM networks for text categorization

Recently, deep learning and deep neural networks have attracted considerable attention and emerged as one predominant field of research in the artificial intelligence community. The developed techniques have also gained widespread use in various domains with good success, such as automatic speech recognition, information retrieval and text classification, etc. Among them, long short-term memory (LSTM) networks are well suited to such tasks, which can capture long-range dependencies among words efficiently, meanwhile alleviating the gradient vanishing or exploding problem during training effectively. Following this line of research, in this paper we explore a novel use of a Siamese LSTM based method to learn more accurate document representation for text categorization. Such a network architecture takes a pair of documents with variable lengths as the input and utilizes pairwise learning to generate distributed representations of documents that can more precisely render the semantic distance between any pair of documents. In doing so, documents associated with the same semantic or topic label could be mapped to similar representations having a relatively higher semantic similarity. Experiments conducted on two benchmark text categorization tasks, viz. IMDB and 20Newsgroups, show that using a three-layer deep neural network based classifier that takes a document representation learned from the Siamese LSTM sub-networks as the input can achieve competitive performance in relation to several state-of-the-art methods.

[1]  Yuan-Fang Wang,et al.  The use of bigrams to enhance text categorization , 2002, Inf. Process. Manag..

[2]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[3]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[4]  Tong Zhang,et al.  Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings , 2016, ICML.

[5]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[6]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[7]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[8]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[9]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[10]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[11]  Maarten Versteegh,et al.  Learning Text Similarity with Siamese Recurrent Networks , 2016, Rep4NLP@ACL.

[12]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[13]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[14]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[15]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[16]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[17]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[18]  Yoshua Bengio,et al.  Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs , 2013, NIPS.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[21]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[24]  Matthew Richardson,et al.  Blending LSTMs into CNNs , 2015, ICLR 2016.

[25]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[26]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[29]  Bing Liu,et al.  Review spam detection , 2007, WWW '07.

[30]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.