Syntactic Stylometry for Deception Detection

Most previous studies in computerized deception detection have relied only on shallow lexico-syntactic patterns. This paper investigates syntactic stylometry for deception detection, adding a somewhat unconventional angle to prior literature. Over four different datasets spanning from the product review to the essay domain, we demonstrate that features driven from Context Free Grammar (CFG) parse trees consistently improve the detection performance over several baselines that are based only on shallow lexico-syntactic features. Our results improve the best published result on the hotel review data (Ott et al., 2011) reaching 91.2% accuracy with 14% error reduction.

[1]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  Xuan Wang,et al.  Exploiting Rich Features for Detecting Hedges and their Scope , 2010, CoNLL Shared Task.

[4]  Justin Zobel,et al.  Searching With Style: Authorship Attribution in Classic Literature , 2007, ACSC.

[5]  Yejin Choi,et al.  Distributional Footprints of Deceptive Product Reviews , 2012, ICWSM.

[6]  A. Vrij,et al.  Cues to Deception and Ability to Detect Lies as a Function of Police Interview Styles , 2007, Law and human behavior.

[7]  Jeffrey T. Hancock,et al.  On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication , 2007 .

[8]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.

[9]  Junhui Wang,et al.  Detecting group review spam , 2011, WWW.

[10]  Galit Avneri,et al.  Style-based Text Categorization: What Newspaper Am I Reading? , 1998 .

[11]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[12]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[13]  Philip Resnik,et al.  More than Words: Syntactic Packaging and Implicit Sentiment , 2009, NAACL.

[14]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[15]  Cindy K. Chung,et al.  The development and psychometric properties of LIWC2007 , 2007 .

[16]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.