The Naive Bayes Classifier in Opinion Mining: In Search of the Best Feature Set

This paper focuses on how naive Bayes classifiers work in opinion mining applications. The first question asked is what are the feature sets to choose when training such a classifier in order to obtain the best results in the classification of objects (in this case, texts). The second question is whether combining the results of Naive Bayes classifiers trained on different feature sets has a positive effect on the final results. Two data bases consisting of negative and positive movie reviews were used when training and testing the classifiers for testing purposes.

[1]  Lina Zhou,et al.  Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[2]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[3]  Anca Dinu,et al.  Short Text Categorization via Coherence Constraints , 2011, 2011 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[4]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[5]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[8]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[9]  Rada Mihalcea,et al.  Characterizing Humour: An Exploration of Features in Humorous Texts , 2009, CICLing.

[10]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[12]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[13]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[14]  Jack G. Conrad,et al.  Opinion mining in legal blogs , 2007, ICAIL.

[15]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .