Incorporating feature representation into BiLSTM for deceptive review detection

Consumers are increasingly influenced by product reviews when purchasing goods or services. At the same time, deceptive reviews usually mislead users. It is inefficient and inaccurate to manually identify deceptive reviews in massive reviews. Therefore, automatically identifying deceptive reviews has become a research trend. Most of existing methods are less effective since they are lack of deeply understanding of reviews. We propose a neural network method with bidirectional long short-term memory (BiLSTM) and feature combination to learn the representation of deceptive reviews. We conduct a large amount of experiments and demonstrate the effectiveness of our proposed method. Specifically, in the mixed-domain detection experiment, the results prove that our model is effective by making comparisons with other neural network-based methods. BiLSTM gives more than 3% improvement in F1 score compared with the most advanced neural network method. Since feature selection plays an important role in this direction, we combine features to improve the performance. Then we get 87.6% F1 value which outperforms the state-of-the-art method. Moreover, in the cross-domain detection experiment, our method achieves 82.4% F1 value which is about 6% higher than the state-of-the-art method on restaurant domain, and it is also robust on doctor domain.

[1]  Sandeep Kumar,et al.  Learning Semantic Sentence Embeddings using Sequential Pair-wise Discriminator , 2018, COLING.

[2]  Ignacio Rojas,et al.  Neural networks: An overview of early research, current frameworks and new challenges , 2016, Neurocomputing.

[3]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[4]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Dong-Hong Ji,et al.  Neural networks for deceptive opinion spam detection: An empirical study , 2017, Inf. Sci..

[7]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[8]  R. Rajalakshmi,et al.  Comparative Evaluation of Various Feature Weighting Methods on Movie Reviews , 2018, Advances in Intelligent Systems and Computing.

[9]  Christopher Krügel,et al.  Detecting Deceptive Reviews Using Generative Adversarial Networks , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[10]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[11]  Gordon V. Cormack,et al.  Email Spam Filtering: A Systematic Review , 2008, Found. Trends Inf. Retr..

[12]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[13]  Peng Wang,et al.  Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification , 2016, Neurocomputing.

[14]  Rashmi Raj,et al.  Web Spam Detection with Anti-Trust Rank , 2006, AIRWeb.

[15]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[16]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[17]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[18]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[19]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[20]  Luyang Li,et al.  Document representation and feature combination for deceptive spam review detection , 2017, Neurocomputing.

[21]  Kyung Hyan Yoo,et al.  Comparison of Deceptive and Truthful Travel Reviews , 2009, ENTER.

[22]  Wolfgang Nejdl,et al.  MailRank: using ranking for spam detection , 2005, CIKM '05.

[23]  Claire Cardie,et al.  Towards a General Rule for Identifying Deceptive Opinion Spam , 2014, ACL.

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.