Classification of Sentimental Reviews Using Natural Language Processing Concepts and Machine Learning Techniques

Natural language processing (NLP) is the hypothetically motivated scope of computational strategies for representing and analyzing naturally occurring text at many levels of textual analysis for the goal of attaining automatic language processing system for multiple tasks and applications. One of the most import applications of natural language processing from industry perspective is sentiment analysis. Sentiment analysis is the most eminent branch of NLP because of its capability to classify any textual document to either as positive or negative polarity. With the proliferation of World Wide Web, huge textual unstructured data in form of tweets, messages, articles, social networking discussions, reviews of products and movies are available so as to extract right information from the large pool. Thus, a need is felt to analyze this data to bring out some hidden facts based on the intention of the author of the text. The intention can be either criticism (negative) of product and movie review or it can be admiration (positive). Although, The intention can vary from strongly positive to positive and strongly negative to negative. This thesis completely focuses on classification of movie reviews in either as positive or negative review using machine learning techniques like Support Vector Machine(SVM), K-Nearest Neighbor(KNN) and Naive Bayes (NB) classifier. Further, a N-gram Model has been proposed where the documents are classified based on unigram, bigram and trigram composition of words in a sentence. Two dataset are considered for this study; one is a labeled polarity dataset where each movie review is either labeled as positive or negative and other one is IMDb movie reviews dataset. Finally, the prediction accuracy of above mentioned machine learning algorithms in different manipulations of same dataset is studied and a comparative analysis has been made for critical examination.

[1]  Trevor J. Hastie,et al.  The Sentimental Factor: Improving Review Classification Via Human-Provided Information , 2004, ACL.

[2]  K. Mouthami,et al.  Sentiment analysis and classification based on textual reviews , 2013, 2013 International Conference on Information Communication and Embedded Systems (ICICES).

[3]  Yogesh Singh,et al.  A REVIEW OF STUDIES ON MACHINE LEARNING TECHNIQUES , 2007 .

[4]  Ral Garreta,et al.  Learning scikit-learn: Machine Learning in Python , 2013 .

[5]  Chris Buckley,et al.  Automatic Text Summarization by Paragraph Extraction , 1997 .

[6]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Hiroya Takamura,et al.  Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees , 2005, PAKDD.

[8]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[9]  Xiao-Jing Wang,et al.  A new approach to feature selection in text classification , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[10]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[13]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[14]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[15]  Shlomo Argamon,et al.  Automatically Determining Attitude Type and Force for Sentiment Analysis , 2007, LTC.

[16]  Mário J. Silva,et al.  TUGAS: Exploiting unlabelled data for Twitter sentiment analysis , 2014, *SEMEVAL.

[17]  Guodong Zhou,et al.  Semi-Supervised Learning for Imbalanced Sentiment Classification , 2011, IJCAI.

[18]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[19]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[20]  Michael Gamon,et al.  Customizing Sentiment Classifiers to New Domains: a Case Study , 2019 .

[21]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[22]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[23]  Songbo Tan,et al.  A survey on sentiment detection of reviews , 2009, Expert Syst. Appl..

[24]  Franco Salvetti,et al.  Automatic Opinion Polarity Classification of Movie Reviews , 2004 .

[25]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[26]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.