Deception detection in Russian texts

Humans are known to detect deception in speech randomly and it is therefore important to develop tools to enable them to detect deception. The problem of deception detection has been studied for a significant amount of time, however the last 10-15 years have seen methods of computational linguistics being employed. Texts are processed using different NLP tools and then classified as deceptive/truthful using machine learning methods. While most research has been performed for English, Slavic languages have never been a focus of detection deception studies. The paper deals with deception detection in Russian narratives. It employs a specially designed corpus of truthful and deceptive texts on the same topic from each respondent, N = 113. The texts were processed using Linguistic Inquiry and Word Count software that is used in most studies of text-based deception detection. The list of parameters computed using the software was expanded due to the designed users’ dictionaries. A variety of text classification methods was employed. The accuracy of the model was found to depend on the author’s gender and text type (deceptive/truthful).

[1]  Cindy K. Chung,et al.  The development and psychometric properties of LIWC2007 , 2007 .

[2]  Tatiana Litvinova,et al.  "Ruspersonality": A Russian corpus for authorship profiling and deception detection , 2016, 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT).

[3]  B. Depaulo,et al.  Accuracy of Deception Judgments , 2006, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[4]  Eileen Fitzpatrick,et al.  Building a Data Collection for Deception Research , 2012 .

[5]  Walter Daelemans,et al.  CLiPS Stylometry Investigation (CSI) corpus: A Dutch corpus for the detection of age, gender, personality, sentiment and deception in text , 2014, LREC.

[6]  Keith A. Johnson Quantitative Methods In Linguistics , 2008 .

[7]  R. Valencia-García,et al.  Seeing through Deception: A Computational Approach to Deceit Detection in Spanish Written Communication , 2013 .

[8]  Fabio Celli,et al.  The Effect of Personality Type on Deceptive Communication Style , 2013, 2013 European Intelligence and Security Informatics Conference.

[9]  Verónica Pérez-Rosas,et al.  Gender Differences in Deceivers Writing Style , 2014, MICAI.

[10]  Sarah Ita Levitan,et al.  Identifying Individual Differences in Gender, Ethnicity, and Personality from Dialogue for Deception Detection , 2016, Proceedings of the Second Workshop on Computational Approaches to Deception Detection.

[11]  Emmerich Kelih,et al.  Quantitative methods in linguistics , 2010, J. Quant. Linguistics.

[12]  A. Vrij Detecting Lies and Deceit: Pitfalls and Opportunities , 2008 .

[13]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[14]  Judee K. Burgoon,et al.  Interpersonal deception: III. Effects of deceit on perceived communication and nonverbal behavior dynamics , 1994 .

[15]  Shervin Malmasi,et al.  Multilingual native language identification , 2015, Natural Language Engineering.

[16]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[17]  R. Valencia-García,et al.  Seeing through Deception: A Computational Approach to Deceit Detection in Written Communication , 2012 .