Distributed Framework for Automating Opinion Discretization From Text Corpora on Facebook

Nowadays, the consecutive increase of the volume of text corpora datasets and the countless research directions in general classification have created a great opportunity and an unprecedented demand for a comprehensive evaluation of the current achievement in the research of natural language processing. There are unfortunately few studies that have applied the combination of convolutional neural networks (CNN) and Apache Spark to the task of automating opinion discretization. In this paper, the authors propose a new distributed structure for solving an opinion classification problem in text mining by utilizing CNN models and big data technologies on Vietnamese text sources. The proposed framework consists of implementation concepts that are needed by a researcher to perform experiments on text discretization problems. It covers all the steps and components that are usually part of a completely practical text mining pipeline: acquiring input data, processing, tokenizing it into a vectorial representation, applying machine learning algorithms, performing the trained models to unseen data, and evaluating their accuracy. The development of the framework started with a specific focus on binary text discretization, but soon expanded toward many other text-categorization-based problems, distributed language modeling and quantification. Several intensive assessments have been investigated to prove the robustness and efficiency of the proposed framework. Resulting in high accuracy (72.99% ± 3.64) from the experiments, one can conclude that it is feasible to perform our proposed distributed framework to the task of opinion discretization on Facebook.

[1]  Tie-Yan Liu,et al.  Knowledge-Powered Deep Learning for Word Embedding , 2014, ECML/PKDD.

[2]  Enhong Chen,et al.  Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective , 2015, IJCAI.

[3]  John C. Platt,et al.  Learning Discriminative Projections for Text Similarity Measures , 2011, CoNLL.

[4]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[5]  Reza Zafarani,et al.  Social Media Mining: An Introduction , 2014 .

[6]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[7]  Janez Bester,et al.  Introduction to the Artificial Neural Networks , 2011 .

[8]  Chenchen Liu,et al.  How convolutional neural networks see the world - A survey of convolutional neural network visualization methods , 2018, Math. Found. Comput..

[9]  Zhiyuan Liu,et al.  Learning Cross-lingual Word Embeddings via Matrix Co-factorization , 2015, ACL.

[10]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[11]  Michael Isard,et al.  Scalability! But at what COST? , 2015, HotOS.

[12]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[13]  Sasha Blair-Goldensohn,et al.  Sentiment Summarization: Evaluating and Learning User Preferences , 2009, EACL.

[14]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[15]  Reynold Xin,et al.  Apache Spark , 2016 .

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Quang-Thuy Ha,et al.  An Upgrading Feature-Based Opinion Mining Model on Vietnamese Product Reviews , 2011, AMT.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Sivaji Bandyopadhyay,et al.  Topic-Based Bengali Opinion Summarization , 2010, COLING.

[20]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[21]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[22]  Michel Généreux,et al.  CBSEAS, a Summarization System - Integration of Opinion Mining Techniques to Summarize Blogs , 2009, EACL.

[23]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Roger Ignazio Mesos in Action , 2016 .

[26]  Holden Karau,et al.  Learning Spark - lightning-fast data analysis, 1st Edition , 2015 .

[27]  OrtigosaAlvaro,et al.  Sentiment analysis in Facebook and its application to e-learning , 2014 .

[28]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Ngo Xuan Bach,et al.  An empirical study on sentiment analysis for Vietnamese , 2014, 2014 International Conference on Advanced Technologies for Communications (ATC 2014).

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[32]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[33]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[34]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[35]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[36]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[37]  Nghia Duong-Trung,et al.  Learning Deep Transferability for Several Agricultural Classification Problems , 2019, International Journal of Advanced Computer Science and Applications.

[38]  Giannis Tzimas,et al.  Large Scale Sentiment Analysis on Twitter with Spark , 2016, EDBT/ICDT Workshops.

[39]  Jack G. Conrad,et al.  Query-based opinion summarization for legal blog entries , 2009, ICAIL.

[40]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[41]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[42]  Quang-Thuy Ha,et al.  A Feature-Based Opinion Mining Model on Product Reviews in Vietnamese , 2011 .

[43]  Sushant Kumar,et al.  IIT Kharagpur at TAC 2008: Statistical Model for Opinion Summarization , 2008, TAC.

[44]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  W. T. Illingworth,et al.  Practical guide to neural nets , 1991 .

[46]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[47]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[48]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[49]  Rosa M. Carro,et al.  Sentiment analysis in Facebook and its application to e-learning , 2014, Comput. Hum. Behav..

[50]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[51]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[52]  Bing Liu,et al.  Opinion Extraction and Summarization on the Web , 2006, AAAI.

[53]  Laurence T. Yang,et al.  A survey on deep learning for big data , 2018, Inf. Fusion.