Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop

Online reviews are the most easily available free information sources used by both organizations and customers to make decisions. Establishments are utilizing significance of opinions to earn undue profit by hiring professionals known as spammers, giving positive comments on their products and negative opinions on their competitor’s product. This activity is known as opinion spamming and should be identified to give genuine results containing sentiments towards a product. So far, opinion spam detection has been considered as a discrete classification problem, generally as spam and non-spam. However, it involves uncertainty as suspicious behavior of a user might be due to coincidence. As, fuzzy logic handles real world uncertainty very well, we propose a novel fuzzy modeling based solution to the problem. We have proposed four fuzzy input linguistic variable and considered suspicious level of a spammer group to be one of—Ultra, Mega, Immense, Highly, Moderate, Slightly and Feebly. We have defined novel FSL Deduction Algorithm generating 81 fuzzy rules and Fuzzy Ranking Evaluation Algorithm (FREA) to determine the extent to which a group is suspicious. As reviews dataset satisfy the three V’s of big data (Volume, Velocity and Variety), we have considered this problem as a big data problem and used Hadoop for storage and analyzation. We have further demonstrated our proposed algorithm using a sample reviews dataset and Amazon reviews dataset achieving an accuracy of 80.77% which unlike other approaches remains steady for large number of groups and deals well with uncertainty involved in opinion spam detection.

[1]  L. Jaba Sheela,et al.  A Review of Sentiment Analysis in Twitter Data Using Hadoop , 2016 .

[2]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[3]  Naomie Salim,et al.  A Framework for Review Spam Detection Research , 2015 .

[4]  Bing Liu,et al.  Identifying Multiple Userids of the Same Author , 2013, EMNLP.

[5]  Devendra K. Tayal,et al.  Sentiment analysis on social campaign “Swachh Bharat Abhiyan” using unigram method , 2017, AI & SOCIETY.

[6]  Arjun Mukherjee,et al.  Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns , 2015, ICWSM.

[7]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[8]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[9]  Rajendra Kumar Roul,et al.  Spam web page detection using combined content and link features , 2016, Int. J. Data Min. Model. Manag..

[10]  Kim-Kwang Raymond Choo,et al.  Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles , 2016, J. Netw. Comput. Appl..

[11]  Akshay Chavan,et al.  Spam Reviews Detection Using Hadoop , 2017 .

[12]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[13]  Fawaz S. Al-Anzi,et al.  Cloud computing: Security model comprising governance, risk management and compliance , 2014, 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC).

[14]  Kim-Kwang Raymond Choo,et al.  Revisiting Semi-Supervised Learning for Online Deceptive Review Detection , 2017, IEEE Access.

[15]  Xizhao Wang,et al.  Learning from big data with uncertainty - editorial , 2015, J. Intell. Fuzzy Syst..

[16]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[17]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[18]  Minhwan Yu,et al.  Deep Semantic Frame-Based Deceptive Opinion Spam Analysis , 2015, CIKM.

[19]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[20]  José Fernando Rodrigues,et al.  ORFEL: Efficient detection of defamation or illegitimate promotion in online recommendation , 2017, Inf. Sci..

[21]  Matt Taddy,et al.  Measuring Political Sentiment on Twitter: Factor Optimal Design for Multinomial Inverse Regression , 2012, Technometrics.

[22]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[23]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[24]  Mohamed Abouelenien,et al.  Gender-based multimodal deception detection , 2017, SAC.

[25]  Devendra K. Tayal,et al.  Fast retrieval approach of sentimental analysis with implementation of bloom filter on Hadoop , 2016, 2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT).

[26]  Jorge A. Balazs,et al.  Opinion Mining and Information Fusion: A survey , 2016, Inf. Fusion.

[27]  Yu-Lin He,et al.  Fuzziness based semi-supervised learning approach for intrusion detection system , 2017, Inf. Sci..

[28]  Luyang Li,et al.  Learning Document Representation for Deceptive Opinion Spam Detection , 2015, CCL.

[29]  Paolo Rosso,et al.  Detection of Opinion Spam with Character n-grams , 2015, CICLing.

[30]  Xingming Sun,et al.  Structural Minimax Probability Machine , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Zhen Lin,et al.  Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification , 2014, Knowl. Based Syst..

[32]  Claire Cardie,et al.  Towards a General Rule for Identifying Deceptive Opinion Spam , 2014, ACL.

[33]  Dirk DeRoos,et al.  Hadoop For Dummies , 2014 .

[34]  Gabriella Pasi,et al.  Quantifier Guided Aggregation for the Veracity Assessment of Online Reviews , 2017, Int. J. Intell. Syst..

[35]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[36]  Sumit Kumar Yadav,et al.  Bloom filter based optimization on HBase with MapReduce , 2014, 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC).

[37]  Geeta Sikka,et al.  Opinion mining of news headlines using SentiWordNet , 2016, 2016 Symposium on Colossal Data Analysis and Networking (CDAN).

[38]  Taghi M. Khoshgoftaar,et al.  Survey of review spam detection using machine learning techniques , 2015, Journal of Big Data.

[39]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[40]  Dong-Hong Ji,et al.  Neural networks for deceptive opinion spam detection: An empirical study , 2017, Inf. Sci..

[41]  Erik Cambria,et al.  Aspect extraction for opinion mining with a deep convolutional neural network , 2016, Knowl. Based Syst..

[42]  Tzung-Pei Hong,et al.  Efficient algorithms for mining high-utility itemsets in uncertain databases , 2016, Knowl. Based Syst..

[43]  Santosh Kumar,et al.  Novel Features for Web Spam Detection , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[44]  Vishal Gupta,et al.  Big data analytics techniques: A survey , 2015, 2015 International Conference on Green Computing and Internet of Things (ICGCIoT).

[45]  Naomie Salim,et al.  Detection of fake opinions using time series , 2016, Expert Syst. Appl..

[46]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[47]  Vishal Gupta,et al.  A Survey on Sentiment Analysis and Opinion Mining Techniques , 2013 .

[48]  Santhosh Kumar,et al.  Temporal Opinion Spam Detection by Multivariate Indicative Signals , 2016, ICWSM.

[49]  J. Yen,et al.  Fuzzy Logic: Intelligence, Control, and Information , 1998 .

[50]  Swati Gupta,et al.  Multimodal sentiment analysis: Sentiment analysis using audiovisual format , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[51]  Fabrício Benevenuto,et al.  Sentiment Analysis Methods for Social Media , 2015, WebMedia.

[52]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[53]  Clare Stanier,et al.  Defining Big Data , 2016, BDAW '16.

[54]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[55]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[56]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[57]  Bruno Ohana,et al.  Sentiment Classification of Reviews Using SentiWordNet , 2009 .

[58]  Mohamed-Slim Alouini,et al.  Instantly decodable network coding for real-time device-to-device communications , 2016, EURASIP J. Adv. Signal Process..

[59]  Rob Law,et al.  Insights into Suspicious Online Ratings: Direct Evidence from TripAdvisor , 2016 .

[60]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[61]  Meera Narvekar,et al.  A review of techniques for sentiment analysis Of Twitter data , 2014, 2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT).

[62]  Jun Li,et al.  Social emotion classification of short text via topic-level maximum entropy model , 2016, Inf. Manag..

[63]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[64]  Gordon V. Cormack,et al.  Email Spam Filtering: A Systematic Review , 2008, Found. Trends Inf. Retr..

[65]  Xizhao Wang,et al.  Attributes Reduction Using Fuzzy Rough Sets , 2008, IEEE Transactions on Fuzzy Systems.

[66]  Witold Pedrycz,et al.  A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning , 2015, IEEE Transactions on Fuzzy Systems.

[67]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[68]  Ting Yu,et al.  Detecting Opinion Spammer Groups Through Community Discovery and Sentiment Analysis , 2015, DBSec.

[69]  Bhawna Rajput,et al.  Polarity detection of sarcastic political tweets , 2014, 2014 International Conference on Computing for Sustainable Global Development (INDIACom).

[70]  Claire Cardie,et al.  Negative Deceptive Opinion Spam , 2013, NAACL.

[71]  Mitsuru Ishizuka,et al.  SentiFul: A Lexicon for Sentiment Analysis , 2011, IEEE Transactions on Affective Computing.

[72]  Li Chen,et al.  News impact on stock price return via sentiment analysis , 2014, Knowl. Based Syst..

[73]  Namgyu Kim,et al.  Detecting blog spam hashtags using topic modeling , 2016, ICEC.

[74]  nbspPreeti Nakum,et al.  Survey on review SPAM detection , 2016 .

[75]  Komal Dhingra,et al.  Opinion mining using SentiFul , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[76]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[77]  Victoria L. Rubin Deception Detection and Rumor Debunking for Social Media , 2017 .