Experiments in Open Domain Deception Detection

The widespread use of deception in online sources has motivated the need for methods to automatically profile and identify deceivers. This work explores deception, gender and age detection in short texts using a machine learning approach. First, we collect a new open domain deception dataset also containing demographic data such as gender and age. Second, we extract feature sets including n-grams, shallow and deep syntactic features, semantic features, and syntactic complexity and readability metrics. Third, we build classifiers that aim to predict deception, gender, and age. Our findings show that while deception detection can be performed in short texts even in the absence of a predetermined domain, gender and age prediction in deceptive texts is a challenging task. We further explore the linguistic differences in deceptive content that relate to deceivers gender and age and find evidence that both age and gender play an important role in people’s word choices when fabricating lies.

[1]  Rada Mihalcea,et al.  Linguistic Ethnography: Identifying Dominant Word Classes in Text , 2009, CICLing.

[2]  R. Valencia-García,et al.  Seeing through Deception: A Computational Approach to Deceit Detection in Written Communication , 2012 .

[3]  Jeffrey T. Hancock,et al.  Separating Fact From Fiction: An Examination of Deceptive Self-Presentation in Online Dating Profiles , 2008, Personality & social psychology bulletin.

[4]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[5]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[6]  Frank Rudzicz,et al.  Automatic detection of deception in child-produced speech using syntactic complexity features , 2013, ACL.

[7]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[8]  James J. Lindsay,et al.  Cues to deception. , 2003, Psychological bulletin.

[9]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[10]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[11]  Kent Marett,et al.  Gender Differences in Deception and Its Detection Under Varying Electronic Media Conditions , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[12]  Hai Zhao,et al.  Using Deep Linguistic Features for Finding Deceptive Opinion Spam , 2012, COLING.

[13]  Claire Cardie,et al.  Negative Deceptive Opinion Spam , 2013, NAACL.

[14]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[15]  J. L. Kaina,et al.  Deception Detection in Multicultural Coalitions: Foundations for a Cognitive Model , 2011 .

[16]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[17]  Heng Ji,et al.  Detecting Deceptive Groups Using Conversations and Network Analysis , 2015, ACL.

[18]  Jeffrey T. Hancock,et al.  Reading between the lines: linguistic cues to deception in online dating profiles , 2010, CSCW '10.