IFSpard: An Information Fusion-based Framework for Spam Review Detection

Online reviews, which contain the quality information and user experience about products, always affect the consumption decisions of customers. Unfortunately, quite a number of spammers attempt to mislead consumers by writing fake reviews for some intents. Existing methods for detecting spam reviews mainly focus on constructing discriminative features, which heavily depend on experts and may miss some complex but effective features. Recently, some models attempt to learn the latent representations of reviews, users, and items. However, the learned embeddings usually lack interpretability. Moreover, most of existing methods are based on single classification model while ignoring the complementarity of different classification models. To solve these problems, we propose IFSpard, a novel information fusion-based framework that aims at exploring and exploiting useful information from various aspects for spam review detection. First, we design a graph-based feature extraction method and an interaction-mining-based feature crossing method to automatically extract basic and complex features with consideration of different sources of data. Then, we propose a mutual-information-based feature selection and representation learning method to remove the irrelevant and redundant information contained in the automatically constructed features. Finally, we devise an adaptive ensemble model to make use of the information of constructed features and the abilities of different classifiers for spam review detection. Experimental results on several public datasets show that the proposed model performs better than state-of-the-art methods.

[1]  Yejin Choi,et al.  Distributional Footprints of Deceptive Product Reviews , 2012, ICWSM.

[2]  Hui Xiong,et al.  Multi-source Information Fusion for Personalized Restaurant Recommendation , 2015, SIGIR.

[3]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[5]  Kyung Hyan Yoo,et al.  Comparison of Deceptive and Truthful Travel Reviews , 2009, ENTER.

[6]  Bin Fu,et al.  Generalized Ambiguity Decompositions for Classification with Applications in Active Learning and Unsupervised Ensemble Pruning , 2017, AAAI.

[7]  Shuaiqiang Wang,et al.  A Hybrid Multigroup Coclustering Recommendation Framework Based on Information Fusion , 2015, ACM Trans. Intell. Syst. Technol..

[8]  Hai Zhao,et al.  Using Deep Linguistic Features for Finding Deceptive Opinion Spam , 2012, COLING.

[9]  Krishna P. Gummadi,et al.  Towards Detecting Anomalous User Behavior in Online Social Networks , 2014, USENIX Security Symposium.

[10]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detection , 2012, TMIS.

[11]  Mahmudur Rahman,et al.  Turning the Tide: Curbing Deceptive Yelp Behaviors , 2014, SDM.

[12]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[13]  Jizhong Han,et al.  Fusion Convolutional Attention Network for Opinion Spam Detection , 2019, ICONIP.

[14]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[15]  Peng Yang,et al.  Deceptive Review Spam Detection via Exploiting Task Relatedness and Unlabeled Data , 2016, EMNLP.

[16]  Qian Li,et al.  Socially-Attentive Representation Learning for Cold-Start Fraud Review Detection , 2019 .

[17]  Jun Zhao,et al.  Detecting Deceptive Review Spam via Attention-Based Neural Networks , 2017, NLPCC.

[18]  Arjun Mukherjee,et al.  Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews , 2013 .

[19]  Tieyun Qian,et al.  Generating Behavior Features for Cold-Start Spam Review Detection , 2019, DASFAA.

[20]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[21]  Xiao Ma,et al.  EARS: Emotion-aware recommender system based on hybrid information fusion , 2019, Inf. Fusion.

[22]  Bing Liu,et al.  An Attribute Enhanced Domain Adaptive Model for Cold-Start Spam Review Detection , 2018, COLING.

[23]  Maryam Sabzevari,et al.  Vote-boosting ensembles , 2016, Pattern Recognit..

[24]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[25]  Li Zhang,et al.  Sparse ensembles using weighted combination methods based on linear programming , 2011, Pattern Recognit..

[26]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[27]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[28]  Jun Zhao,et al.  Learning to Represent Review with Tensor Decomposition for Spam Detection , 2016, EMNLP.

[29]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[30]  Leman Akoglu,et al.  Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[31]  Abhinav Kumar,et al.  Spotting opinion spammers using behavioral footprints , 2013, KDD.

[32]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[33]  Dong Li,et al.  Spam Review Detection with Graph Convolutional Networks , 2019, CIKM.

[34]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[35]  Yu Huang,et al.  FdGars: Fraudster Detection via Graph Convolutional Networks in Online App Review System , 2019, WWW.

[36]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[37]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[38]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[39]  Jun Zhao,et al.  Handling Cold-Start Problem in Review Spam Detection by Jointly Embedding Texts and Behaviors , 2017, ACL.

[40]  Claire Cardie,et al.  Towards a General Rule for Identifying Deceptive Opinion Spam , 2014, ACL.