Applying of Sentiment Analysis for Texts in Russian Based on Machine Learning Approach

This paper considers the problem of Sentiment classification in text messages in Russian with using Machine Learning methods Naive Bayes classifier and the Support Vector Machine. One of the features of the Russian language is using of a wide variety of declensional endings depending on the declination, tenses, grammatical gender. Another common problem of sentiment classification for different languages is that different words can have the same meaning (synonyms) and thus give equal emotional value. Therefore, our task was to evaluate on how the lemmatization affects the sentiment classification accuracy (or another, with endings and without them), and to compare the results for Russian and English languages. For evaluating the impact of synonymy, we used the approach when the words with the same meaning are grouping into a single term. To solve these problems we used lemmatization and synonyms libraries. The results showed that using lemmatization for texts in Russian improves the accuracy of sentiment classification. On the contrary, the sentiment classification of texts in English without using lemmatization yields better result. The results also showed that the use synonymy in the model has a positive influence on accuracy. In the "Introduction", we describe a place Sentiment Analysis in Data Mining. In the "Approaches to the Sentiment Analysis", we tell about the main approaches of Sentiment Analysis: linguistic approach, an approach based on Machine Learning, and their combination. In the "Description of algorithms for Sentiment Analysis", we state the problem of sentiment classification and describe methods for solving it using a Naïve Bayesian classifier, Bagging, Support Vector Machine. In the "Results of experiments", we describe aims of the experiment and the features of the implementation of the algorithm and report the results of the experiment. In the "Conclusion", we present the output from the results. Keywords-text analysis; analysis of tonality; sentiment analysis; machine learning.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[5]  S. F.R.,et al.  An Essay towards solving a Problem in the Doctrine of Chances . By the late Rev . Mr . Bayes , communicated by Mr . Price , in a letter to , 1999 .

[6]  Timothy O'Keefe Feature Selection and Weighting Methods in Sentiment Analysis , 2009 .

[7]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[8]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[9]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[10]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[11]  Eric Brill,et al.  Reducing the human overhead in text categorization , 2006, KDD '06.

[12]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[13]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[14]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.