Is this hotel review truthful or deceptive? A platform for disinformation detection through computational stylometry

In this paper, we present a web service platform for disinformation detection in hotel reviews written in English. The platform relies on a hybrid approach of computational stylometry techniques, machine learning and linguistic rules written using COGITO, Expert System Corp.’s semantic intelligence software thanks to which it is possible to analyze texts and extract all their characteristics. We carried out a research experiment on the Deceptive Opinion Spam corpus, a balanced corpus composed of 1,600 hotel reviews of 20 Chicago hotels split into four datasets: positive truthful, negative truthful, positive deceptive and negative deceptive reviews. We investigated four different classifiers and we detected that Simple Logistic is the most performing algorithm for this type of classification.

[1]  H. van Halteren,et al.  Outside the cave of shadows: using syntactic annotation to enhance authorship attribution , 1996 .

[2]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[3]  Gerhard Weikum,et al.  Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media , 2017, WWW.

[4]  Sibel Adali,et al.  This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[5]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[6]  Jure Leskovec,et al.  Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes , 2016, WWW.

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Rong Zheng,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[11]  F. Mosteller,et al.  A comparative study of discrimination methods applied to the authorship of the disputed Federalist papers , 2016 .

[12]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[13]  S. Lecheler,et al.  Fake news as a two-dimensional phenomenon: a framework and research agenda , 2019, Annals of the International Communication Association.

[14]  Donghong Ji,et al.  Learning to Detect Deceptive Opinion Spam: A Survey , 2019, IEEE Access.

[15]  Claire Cardie,et al.  Negative Deceptive Opinion Spam , 2013, NAACL.

[16]  Shlomo Argamon,et al.  Style mining of electronic messages for multiple authorship discrimination: first results , 2003, KDD '03.

[17]  Walter Daelemans,et al.  Explanation in Computational Stylometry , 2013, CICLing.

[18]  Miriam J. Metzger,et al.  The science of fake news , 2018, Science.

[19]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.