On Multi-tier Sentiment Analysis Using Supervised Machine Learning

Document management and Information Retrieval tasks have rapidly increased due to the availability of digital documents anytime, any place. The need for automatic extraction of document information has become prominent in information organization and knowledge discovery. Text Classification is one such solution, where in the natural language text is assigned to one or more predefined categories based on the content. This work focuses on sentiment analysis, also known as opinion mining. It is a way of automatically extracting and analyzing the emotions and opinions, and not facts, of messages and posts. A multi-tier classification architecture is proposed, which consists of major modules such as data cleaning and pre-processing, feature selection, and classifier training that includes a multi-tier prediction model. The architecture and its components are carefully described. Four classifiers (Naïve Bayes, SVM, Random Forest, and SGD) are used in the experiments, which evaluate the performance of the proposed multi-tier architecture by analyzing the sentiments and opinions of 150,000 movie reviews. Results have shown that the multi-tier model is able to significantly improve prediction accuracy over the single-tier model by more than 10%, the improvement is significant when customized dictionary is used. We believe that the proposed multi-tier classification architecture, with the various feature selection techniques described and used, are significant, and are readily applicable to many other areas of sentiment analysis.

[1]  Stan Matwin,et al.  Hierarchical Classification Approach to Emotion Recognition in Twitter , 2012, 2012 11th International Conference on Machine Learning and Applications.

[2]  Ulrik Brandes,et al.  Social Networks , 2013, Handbook of Graph Drawing and Visualization.

[3]  Anne Laurent,et al.  Classification of brand names based on n-grams , 2010, 2010 International Conference of Soft Computing and Pattern Recognition.

[4]  Fouzi Harrag,et al.  Stemming as a feature reduction technique for Arabic Text Categorization , 2011, 2011 10th International Symposium on Programming and Systems.

[5]  Xiaojun Li,et al.  A sentiment analysis model for hotel reviews based on supervised learning , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[6]  Jalel Akaichi,et al.  Social Networks' Facebook' Statutes Updates Mining for Sentiment Classification , 2013, 2013 International Conference on Social Computing.