Sentiment analysis via Doc2Vec and Convolutional Neural Network hybrids

In this study, we proposed two ensembled Convolutional Neural Network architectures viz. (CNNcuPSONN) and CNN-PNN, where cuPSONN is a CUDA enabled particle swarm optimization optimized neural network and PNN is the probabilistic neural network. We compared their performance with that of the standalone PNN. The techniques are invoked after distributed Doc2Vec was employed on movie review dataset of the size of 992 MB, to get the paragraph embeddings of the dataset. Apache Spark framework was used to get the embeddings. Here, we considered CNN as a feature extractor from the embeddings, while cuPSONN and PNN perform classification. The third proposed model is PNN preceded by t-statistic based feature selection (t-statistic-PNN). Among the three classifiers employed, CNN-PNN yielded an area under ROC curve of 96.63% under the 10fold cross validation framework. The proposed CNN-PNN turned out to be statistically significant with respect to the CNN-cuPSONN and t-statistic-PNN, while it is statistically the same as the distributed multi-layer perceptron (DMLP), which is considered as baseline classifier to compare the performance of our proposed algorithms.

[1]  Davide Anguita,et al.  Statistical Learning Theory and ELM for Big Social Data Analysis , 2016, IEEE Computational Intelligence Magazine.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Eamonn J. Keogh,et al.  Curse of Dimensionality , 2010, Encyclopedia of Machine Learning.

[9]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[10]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[11]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[12]  Erik Cambria,et al.  A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks , 2016, COLING.

[13]  Mike Thelwall,et al.  Sentiment Analysis Is a Big Suitcase , 2017, IEEE Intelligent Systems.

[14]  Pushpak Bhattacharyya,et al.  A Hybrid Deep Learning Architecture for Sentiment Analysis , 2016, COLING.

[15]  Ching Y. Suen,et al.  A novel hybrid CNN-SVM classifier for recognizing handwritten digits , 2012, Pattern Recognit..

[16]  K. Robert Lai,et al.  Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model , 2016, ACL.

[17]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[19]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[20]  Vadlamani Ravi,et al.  Differential evolution trained wavelet neural networks: Application to bankruptcy prediction in banks , 2009, Expert Syst. Appl..

[21]  Tao Chen,et al.  Expert Systems With Applications , 2022 .

[22]  Peng Zhou,et al.  Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling , 2016, COLING.

[23]  Jure Leskovec,et al.  From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews , 2013, WWW.

[24]  Mehmet Ulas Cakir,et al.  Text Mining Analysis in Turkish Language Using Big Data Tools , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[25]  Giannis Tzimas,et al.  Large Scale Sentiment Analysis on Twitter with Spark , 2016, EDBT/ICDT Workshops.

[26]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[27]  Reynold Xin,et al.  Apache Spark , 2016 .

[28]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[29]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Ngoc Thang Vu,et al.  CIS-positive: A Combination of Convolutional Neural Networks and Support Vector Machines for Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[32]  Hiroshi Hattori,et al.  A CUDA Implementation of the Standard Particle Swarm Optimization , 2016, 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC).