Effective Use of Evaluation Measures for the Validation of Best Classifier in Urdu Sentiment Analysis

Sentiment analysis (SA) can help in decision making, drawing conclusion, or recommending appropriate solution for different business, political, or other problems. At the same time reliable ways are also required to verify the results that are achieved after SA. In the frame of biologically inspired approaches for machine learning, getting reliable result is challenging but important. Properly verified and validated results are always appreciated and preferred by the research community. The strategy of achieving reliable result is adopted in this research by using three standard evaluation measures. First, SA of Urdu is performed. After collection and annotation of data, five classifiers, i.e., PART, Naives Bayes mutinomial Text, Lib SVM (support vector machine), decision tree (J48), and k nearest neighbor (KNN, IBK) are employed using Weka. After using 10-fold cross-validation, three top most classifiers, i.e., Lib SVM, J48, and IBK are selected on the basis of high accuracy, precision, recall, and F-measure. Further, IBK resulted as the best classifier among the three. For verification of this result, labels of the sentences (positive, negative, or neutral) are predicted by using training and test data, followed by the application of the three standard evaluation measures, i.e., McNemar’s test, kappa statistic, and root mean squared error. IBK performs much better than the other two classifiers. To make this result more reliable, a number of steps are taken including the use of three evaluation measures for getting a confirmed and validated result which is the main contribution of this research. It is concluded with confidence that IBK is the best classifier in this case.

[1]  Ulisses M Braga-Neto,et al.  Classification and Error Estimation for Discrete Data , 2009, Current genomics.

[2]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[3]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[4]  Mohib Ullah,et al.  Roman Urdu Opinion Mining System (RUOMiS) , 2015, ArXiv.

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Olga Tushkanova,et al.  Comparative Analysis of the Numerical Measures for Mining Associative and Causal Relationships in Big Data , 2015 .

[7]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[8]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[9]  Hamido Fujita,et al.  A hybrid approach to the sentiment analysis problem at the sentence level , 2016, Knowl. Based Syst..

[10]  Uzay Kaymak,et al.  Cohen's kappa coefficient as a performance measure for feature selection , 2010, International Conference on Fuzzy Systems.

[11]  Björn W. Schuller,et al.  SenticNet 4: A Semantic Resource for Sentiment Analysis Based on Conceptual Primitives , 2016, COLING.

[12]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[13]  Muhammad Aslam,et al.  Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text , 2012, Artificial Intelligence Review.

[14]  Usman Qamar,et al.  Multi-Objective Model Selection (MOMS)-based Semi-Supervised Framework for Sentiment Analysis , 2016, Cognitive Computation.

[15]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[16]  Aslam Muhammad,et al.  Adjectival Phrases as the Sentiment Carriers in the Urdu Text , 2011 .

[17]  Zhao Yang,et al.  Generalized McNemar's Test for Homogeneity of the Marginal Distributions , 2008 .

[18]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[19]  P. Westfall,et al.  Multiple McNemar Tests , 2010, Biometrics.

[20]  Bernardete Ribeiro,et al.  The importance of stop word removal on recall values in text categorization , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[21]  Shrikanth S. Narayanan,et al.  Tweester at SemEval-2016 Task 4: Sentiment Analysis in Twitter Using Semantic-Affective Model Adaptation , 2016, *SEMEVAL.

[22]  Eibe Frank,et al.  Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.

[23]  Enrico Brina A Classification Perspective on the Future of Ship Design and Technology , 2012 .

[24]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[25]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[26]  Jon Atli Benediktsson,et al.  The effect of classifier agreement on the accuracy of the combined classifier in decision level fusion , 2001, IEEE Trans. Geosci. Remote. Sens..

[27]  Steven Skiena,et al.  International Sentiment Analysis for News and Blogs , 2021, ICWSM.

[28]  Ana María Martínez Enríquez,et al.  Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits , 2010, MICAI.

[29]  Lior Rokach,et al.  Unsupervised Commonsense Knowledge Enrichment for Domain-Specific Sentiment Analysis , 2016, Cognitive Computation.

[30]  Amir Hussain,et al.  From Spin to Swindle: Identifying Falsification in Financial Text , 2016, Cognitive Computation.

[31]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[32]  Davide Anguita,et al.  Statistical Learning Theory and ELM for Big Social Data Analysis , 2016, IEEE Computational Intelligence Magazine.

[33]  Arie Ben-David,et al.  Comparison of classification accuracy using Cohen's Weighted Kappa , 2008, Expert Syst. Appl..

[34]  Muhammad Shahid,et al.  Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques , 2016, J. King Saud Univ. Comput. Inf. Sci..

[35]  Erkan Bostanci,et al.  An Evaluation of Classification Algorithms Using Mc Nemar's Test , 2012, BIC-TA.