Evaluation of data mining features, features taxonomies and their applications

The World Wide Web has brought an enormous improvement in the lives of people, during the last couple of decades. E-commerce is a new area arisen during this evolutionary period and has changed the traditional trading approaches for selling products and services. It uses different techniques to discover a market trend and analyze the competitor’s activities by exploiting reviews’ information. On the other hand, potential customers, also, use the online opinion to make their purchase decision. Opinion mining and sentiment analysis are the most critical and fundamental domains of data mining which can be useful for variety its sub-domains such as opinion summarization, recommendation system and opinion spam detection.  Opinion mining and all its sub-branches can be performed efficiently when there is a comprehensive understanding of the most effective features applied in those domains. To achieve the best results, we need to use the most proper set of features for different case studies in order to classification or clustering. To the best of our knowledge, there is no extensive study and taxonomy of variety range of features and their applications in opinion mining. In this paper, we do comprehensive investigation on various types of features exploited in variety sub-branches of opinion mining domain. We present the most frequent features’ sets including structural, linguistic and relation-based features as a complete reference for further opinion mining research. The results proved that using multiple types of features improve the accuracy of opinion mining applications.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  Arun Sundararajan,et al.  Opinion Mining using Econometrics: A Case Study on Reputation Systems , 2007, ACL.

[3]  Hsin-Hsi Chen,et al.  Opinion Spammer Detection in Web Forum , 2015, SIGIR.

[4]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[5]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[6]  Hsin-Hsi Chen,et al.  Opinion Spam Detection in Web Forum: A Real Case Study , 2015, WWW.

[7]  Stephen J. Carson,et al.  The Effects of Positive and Negative Online Customer Reviews: Do Brand Strength and Category Maturity Matter? , 2013 .

[8]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[9]  Bing Liu,et al.  Spotting Fake Reviews via Collective Positive-Unlabeled Learning , 2014, 2014 IEEE International Conference on Data Mining.

[10]  Wai Lam,et al.  A Unified Model for Unsupervised Opinion Spamming Detection Incorporating Text Generality , 2015, IJCAI.

[11]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[12]  Priyank Thakkar,et al.  Opinion Spam Detection Using Feature Selection , 2014, 2014 International Conference on Computational Intelligence and Communication Networks.

[13]  Claire Cardie,et al.  Towards a General Rule for Identifying Deceptive Opinion Spam , 2014, ACL.

[14]  Derek Greene,et al.  Distortion as a validation criterion in the identification of suspicious reviews , 2010, SOMA '10.

[15]  Arjun Mukherjee,et al.  Spam Detection : An Unsupervised Approach using Generative Models , 2014 .

[16]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[17]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[18]  Bing Liu,et al.  Review spam detection , 2007, WWW '07.

[19]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[20]  Claire Cardie,et al.  Estimating the prevalence of deception in online review communities , 2012, WWW.

[21]  Arjun Mukherjee,et al.  Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns , 2015, ICWSM.

[22]  Geoffrey I. Webb,et al.  Advances in Knowledge Discovery and Data Mining , 2018, Lecture Notes in Computer Science.

[23]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[24]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[25]  Kyung Hyan Yoo,et al.  Comparison of Deceptive and Truthful Travel Reviews , 2009, ENTER.

[26]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[27]  Hao Wu,et al.  Towards online anti-opinion spam: Spotting fake reviews from the review sequence , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[28]  Alaa El-Halees,et al.  An approach for detecting spam in arabic opinion reviews , 2015, Int. Arab J. Inf. Technol..

[29]  Junhui Wang,et al.  Detecting group review spam , 2011, WWW.

[30]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[31]  Tao Wang,et al.  Voting for Deceptive Opinion Spam Detection , 2014, ArXiv.

[32]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[33]  X. Zhang,et al.  Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics , 2010 .

[34]  J. Pennebaker,et al.  Linguistic styles: language use as an individual difference. , 1999, Journal of personality and social psychology.

[35]  Bing Liu,et al.  Analyzing and Detecting Review Spam , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[36]  Zhu Zhang,et al.  Utility scoring of product reviews , 2006, CIKM '06.

[37]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[38]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[39]  Hsin-Hsi Chen,et al.  Opinion mining and relationship discovery using CopeOpi opinion analysis system , 2009 .

[40]  S. Shivashankar,et al.  Conceptual level similarity measure based review spam detection , 2010, 2010 International Conference on Signal and Image Processing.

[41]  Abhinav Kumar,et al.  Spotting opinion spammers using behavioral footprints , 2013, KDD.

[42]  Lei Zhang,et al.  Simultaneously detecting fake reviews and review spammers using factor graph model , 2013, WebSci.

[43]  Xifeng Yan,et al.  Synthetic review spamming and defense , 2013, WWW.

[44]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[45]  Chrysanthos Dellarocas,et al.  Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior , 2000, EC '00.

[46]  Aoying Zhou,et al.  Towards online review spam detection , 2014, WWW.

[47]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.