Multi-view Ensemble Learning Using Rough Set Based Feature Ranking for Opinion Spam Detection

Product reviews and blogs play a vital role in giving an insight to end user for making purchasing decision. Studies show a direct link between product reviews/rating and revenue of product. So, review hosting sites are often targeted to promote or demote products by writing fake reviews. These fictitious opinions which are written to sound authentic known as deceptive opinion spam. To build an automatic classifier for opinion spam detection, feature engineering plays an important role. Deceptive cues are needed to be transformed into features. We have extracted various psychological, linguistic, and other textual features from text reviews. We have used mMulti-view Ensemble Learning (MEL) to build the classifier. Rough Set Based Optimal Feature Set Partitioning (RS-OFSP) algorithm is proposed to construct views for MEL. Proposed algorithm shows promising results when compared to random feature set partitioning (Bryll Pattern Recognit 36(6):1291–1302, 2003) [1] and optimal feature set partitioning (Kumar and Minz Knowl Inf Syst, 2016) [2].

[1]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[2]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[3]  Ling Liu,et al.  Fraud Detection in Online Consumer Reviews , 2008, Decis. Support Syst..

[4]  Paolo Rosso,et al.  Detecting positive and negative deceptive opinions using PU-learning , 2015, Inf. Process. Manag..

[5]  Bing Liu,et al.  Identifying Multiple Userids of the Same Author , 2013, EMNLP.

[6]  Arno Scharl,et al.  Enriching semantic knowledge bases for opinion mining in big data applications , 2014, Knowl. Based Syst..

[7]  Claire Cardie,et al.  Negative Deceptive Opinion Spam , 2013, NAACL.

[8]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[9]  Victoria Johansson,et al.  Lexical diversity and lexical density in speech and writing , 2009 .

[10]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[11]  Vipin Kumar,et al.  Multi-view ensemble learning: an optimal feature set partitioning for high-dimensional data classification , 2015, Knowledge and Information Systems.

[12]  Arjun Mukherjee,et al.  Exploiting Burstiness in Reviews for Review Spammer Detection , 2021, ICWSM.

[13]  Santhosh Kumar,et al.  Temporal Opinion Spam Detection by Multivariate Indicative Signals , 2016, ICWSM.

[14]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[15]  Taketoshi Yoshida,et al.  CoSpa: A Co-training Approach for Spam Review Identification with Support Vector Machine , 2016, Inf..

[16]  Claire Cardie,et al.  TopicSpam: a Topic-Model based approach for spam detection , 2013, ACL.

[17]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[18]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[19]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[20]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.