An approach to improve the accuracy of probabilistic classifiers for decision support systems in sentiment analysis

Abstract Social networks link people and machines, providing a huge amount of information that grows very fast without the possibility to be handled manually. Moreover, opinion mining is the process of using natural language processing, text analytics and computational linguistics to identify and extract subjective information in different sources such as social networks. To that, classification methods are used but due to the limitless number of topics and the breadth and ambiguity of natural language, with its peculiarities in social networks, the results can be greatly improved. In this work, we present DSociaL, a platform to automate the processing of information obtained from social networks, focusing on improving the accuracy of decision support systems for sentiment analysis. We focus on machine learning-based simple probabilistic classifiers, evaluating a naive Bayes classifier, the basis of one of the most used soft computing techniques. Thus, we show a use case in which the proposal, with definitions and refinements made by experts, helps to improve the prediction of users’ feelings towards a movie compared to what would happen with a conventional approach.

[1]  Asma Parveen,et al.  PREDICTION SYSTEM FOR HEART DISEASE USING NAIVE BAYES , 2012 .

[2]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[3]  Jon M. Kleinberg,et al.  Challenges in mining social network data: processes, privacy, and paradoxes , 2007, KDD '07.

[4]  Enrique Herrera-Viedma,et al.  CARESOME: A system to enrich marketing customers acquisition and retention campaigns using social media information , 2015, Knowl. Based Syst..

[5]  Harith Alani,et al.  Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[6]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[7]  Laura Ferrari,et al.  A Comparison between Preprocessing Techniques for Sentiment Analysis in Twitter , 2016, KDWeb.

[8]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[9]  Giovanni Felici,et al.  Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers , 2013, Int. J. Secur. Networks.

[10]  Liangxiao Jiang,et al.  Naive Bayes text classifiers: a locally weighted learning approach , 2013, J. Exp. Theor. Artif. Intell..

[11]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[12]  Hazem M. Hajj,et al.  A Framework for Emotion Mining from Text in Online Social Networks , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[13]  M. Vijayalakshmi,et al.  Identifying Concept-drift in Twitter Streams , 2015 .

[14]  Genshe Chen,et al.  Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier , 2013, 2013 IEEE International Conference on Big Data.

[15]  E. Wenger,et al.  Communities of Practice: The Organizational Frontier , 2000 .

[16]  Cagatay CATAL,et al.  A sentiment classification model based on multiple classifiers , 2017, Appl. Soft Comput..

[17]  José Francisco Aldana Montes,et al.  A Fine Grain Sentiment Analysis with Semantics in Tweets , 2016, Int. J. Interact. Multim. Artif. Intell..

[18]  Ashok N. Srivastava,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2005, J. Comput. Inf. Sci. Eng..

[19]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[20]  Danah Boyd,et al.  Social Network Sites: Definition, History, and Scholarship , 2007, J. Comput. Mediat. Commun..

[21]  Tao Chen,et al.  Expert Systems With Applications , 2022 .

[22]  Juan Manuel Cueva Lovelle,et al.  Towards a Standard-based Domain-specific Platform to Solve Machine Learning-based Problems , 2015, Int. J. Interact. Multim. Artif. Intell..

[23]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[24]  Robert Dale,et al.  Handbook of Natural Language Processing , 2001, Computational Linguistics.

[25]  Arie van Deursen,et al.  Domain-specific languages: an annotated bibliography , 2000, SIGP.

[26]  Arie van Deursen,et al.  Little languages: little maintenance , 1998 .

[27]  Hiroya Takamura,et al.  Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees , 2005, PAKDD.

[28]  Mehmed Kantardzic,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[29]  Juan Manuel Cueva Lovelle,et al.  TALISMAN MDE: Mixing MDE principles , 2010, J. Syst. Softw..

[30]  Jules White,et al.  Applying machine learning classifiers to dynamic Android malware detection at scale , 2013, 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC).

[31]  Luis Alfonso Ureña López,et al.  Ranked WordNet graph for Sentiment Polarity Classification in Twitter , 2014, Comput. Speech Lang..

[32]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[33]  Yong Shi,et al.  The Role of Text Pre-processing in Sentiment Analysis , 2013, ITQM.

[34]  Alexandra Balahur,et al.  Sentiment analysis meets social media - Challenges and solutions of the field in view of the current information sharing context , 2015, Inf. Process. Manag..

[35]  Vicente García-Díaz,et al.  TALISMAN MDE Framework: An Architecture for Intelligent Model-Driven Engineering , 2009, IWANN.

[36]  Sam Ruby,et al.  RESTful Web Services , 2007 .

[37]  David Bell,et al.  Microblogging as a mechanism for human-robot interaction , 2014, Knowl. Based Syst..

[38]  Qingcai Chen,et al.  Fuzzy deep belief networks for semi-supervised sentiment classification , 2014, Neurocomputing.

[39]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[40]  Miguel Ángel Guevara-López,et al.  Discovering Mammography-based Machine Learning Classifiers for Breast Cancer Diagnosis , 2012, Journal of Medical Systems.

[41]  Vandana Jagtap,et al.  Analysis of different approaches to Sentence-Level Sentiment Classification , 2013 .

[42]  Tajinder Singh,et al.  Role of Text Pre-processing in Twitter Sentiment Analysis , 2016 .

[43]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[44]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[45]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[46]  Jorge A. Balazs,et al.  Opinion Mining and Information Fusion: A survey , 2016, Inf. Fusion.

[47]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Bonnie A. Nardi,et al.  A Small Matter of Programming: Perspectives on End User Computing , 1993 .

[49]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[50]  Miguel A. Alonso,et al.  Universal, Unsupervised, Uncovered Sentiment Analysis , 2016, ArXiv.

[51]  Min Song,et al.  Opinion polarity detection in Twitter data combining shrinkage regression and topic modeling , 2016, J. Informetrics.

[52]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[53]  George P. Petropoulos,et al.  Hyperion hyperspectral imagery analysis combined with machine learning classifiers for land use/cover mapping , 2012, Expert Syst. Appl..

[54]  Matthew K. O. Lee,et al.  Online social networks: Why do students use facebook? , 2011, Comput. Hum. Behav..

[55]  Luis F. Chiroque,et al.  Graph-based Techniques for Topic Classification of Tweets in Spanish , 2014, Int. J. Interact. Multim. Artif. Intell..

[56]  Paul Hudak,et al.  Domain Specific Languages , 1998 .

[57]  Enrique Herrera-Viedma,et al.  A new model to quantify the impact of a topic in a location over time with Social Media , 2015, Expert Syst. Appl..

[58]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[59]  Enrique Herrera-Viedma,et al.  A linguistic consensus model for Web 2.0 communities , 2013, Appl. Soft Comput..

[60]  Arie van Deursen,et al.  Domain-specific language design requires feature descriptions , 2002 .

[61]  Jon Louis Bentley,et al.  Programming pearls: little languages , 1986, CACM.

[62]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[63]  M Mernik,et al.  When and how to develop domain-specific languages , 2005, CSUR.

[64]  Lijuan Wang,et al.  The Role of Pre-processing in Twitter Sentiment Analysis , 2014, ICIC.

[65]  Vivek Narayanan,et al.  Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model , 2013, IDEAL.

[66]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[67]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[68]  Danushka Bollegala,et al.  Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus , 2013, IEEE Transactions on Knowledge and Data Engineering.

[69]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[70]  Michael Stonebraker,et al.  SQL databases v. NoSQL databases , 2010, CACM.

[71]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .