Mining Opinion in Online Messages

The number of messages that can be mined from online entries increases as the number of online application users increases. In Malaysia, online messages are written in mixed languages known as ‘Bahasa Rojak’. Therefore, mining opinion using natural language processing activities is difficult. This study introduces a Malay Mixed Text Normalization Approach (MyTNA) and a feature selection technique based on Immune Network System (FS-INS) in the opinion mining process using machine learning approach. The purpose of MyTNA is to normalize noisy texts in online messages. In addition, FS-INS will automatically select relevant features for the opinion mining process. Several experiments involving 1000 positive movies feedback and 1000 negative movies feedback have been conducted. The results show that accuracy values of opinion mining using Naive Bayes (NB), k-Nearest Neighbor (kNN) and Sequential Minimal Optimization (SMO) increase after the introduction of MyTNA and FS-INS.

[1]  Kenji Araki,et al.  Text Normalization in Social Media: Progress, Problems and Applications for a Pre-Processing System of Casual English , 2011 .

[2]  Timothy O'Keefe Feature Selection and Weighting Methods in Sentiment Analysis , 2009 .

[3]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[4]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[5]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[6]  Franco Salvetti,et al.  Opinion Polarity Identification of Movie Reviews , 2006, Computing Attitude and Affect in Text.

[7]  Alexander O'Neill,et al.  Sentiment Mining for Natural Language Documents , 2009 .

[8]  Cédrick Fairon,et al.  A translated corpus of 30,000 French SMS , 2006, LREC.

[9]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[10]  Eugénio C. Oliveira,et al.  Tokenizing micro-blogging messages using a text classification approach , 2010, AND '10.

[11]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[12]  Marie-Francine Moens,et al.  Automatic Sentiment Analysis in On-line Text , 2007, ELPUB.

[13]  Jian Su,et al.  A Phrase-Based Statistical Model for SMS Text Normalization , 2006, ACL.

[14]  Janyce Wiebe,et al.  Annotating Opinions in the World Press , 2003, SIGDIAL Workshop.

[15]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[16]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[17]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[18]  Robert J. Hilderman,et al.  Categorical Proportional Difference: A Feature Selection Method for Text Categorization , 2008, AusDM.

[19]  Mohd Zakree Ahmad Nazri,et al.  Normalization of common noisy terms in Malaysian online media , 2012 .

[20]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.

[21]  Shourya Roy,et al.  Special issue on noisy text analytics , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[22]  Andrei Mikheev,et al.  Document centered approach to text normalization , 2000, SIGIR '00.

[23]  L. Venkata Subramaniam,et al.  Handling Noisy Queries in Cross Language FAQ Retrieval , 2010, EMNLP.