Automatic Construction of Movie Domain Korean Sentiment Dictionary Using Online Movie Reviews

We present a method of automatically constructing a domain-specific Korean sentiment dictionary which can be used to classify the sentiment of online movie reviews. More than 1.18 million online movie reviews with movie ratings ranging between 1 to 4 and 7 to 10 were collected across fourteen different movie genres to calculate the joint probability of a given word and the sentiment of movie reviews for each genre. In particular, the joint probability of (1) a given word and the positive movie reviews that contain movie ratings 7 to 10 and (2) a given word and the negative movie reviews that contain movie ratings 1 to 4 for each movie genre were calculated. The difference between the two joint probabilities (i.e., (1) – (2)) was obtained for each word in each genre, and the fourteen genres’ joint probability differences of each word were averaged. Finally, the averaged joint probability difference values were normalized to range between -1 and 1. These normalized values were utilized as the sentiment values of each word in the final 135,082-word movie domain Korean sentiment dictionary. The positive/negative binary sentiment classification performance of the constructed sentiment dictionary was evaluated using test data, and the balanced accuracy of 80.7% was achieved, confirming the effectiveness of the proposed sentiment dictionary construction method.

[1]  Bing Liu,et al.  Opinion Mining and Sentiment Analysis , 2011 .

[2]  Jong-Seok Lee,et al.  Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews , 2014, Knowl. Based Syst..

[3]  Tai-hoon Kim,et al.  A Review on Natural Language Processing in Opinion Mining , 2010 .

[4]  Joan Lu,et al.  Research Directions, Challenges and Issues in Opinion Mining , 2013 .

[5]  Mitsuru Ishizuka,et al.  SentiFul: A Lexicon for Sentiment Analysis , 2011, IEEE Transactions on Affective Computing.

[6]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[7]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[8]  Xiaolan Guan,et al.  A Model of Ecological Monitoring and Response System for Internet Public Opinion , 2014, MUE 2014.

[9]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[10]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[11]  Philip J. Stone,et al.  A computer approach to content analysis: studies using the General Inquirer system , 1963, AFIPS Spring Joint Computing Conference.

[12]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[13]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[14]  Angela Fahrni,et al.  Old Wine or Warm Beer : Target-Specific Sentiment Analysis of Adjectives , .

[15]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[16]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[17]  Carina Silberer,et al.  Proceedings of the International Conference on Language Resources and Evaluation (LREC) , 2008 .