Russian text corpora for deception detection studies

Text-based deception detection is presently on the way to gain even more significance as related studies certainly have both theoretical and practical value and a range of applications for police, security, and customs, as well as predatory communications, e.g. Internet scams). For these studies designing text corpora is essential. Text-based deception detection has been mostly dealt with using English as well as a few other European languages. There is not sufficient research into the problem with the use of Slavic languages, which is mostly due to no corresponding corpora available. In this article we propose an overview of existing text corpora employed in studies of text-based deception detection as well as a detailed description of available Russian corpora specially designed for text-based deception detection.

[1]  Victoria L. Rubin,et al.  The art of creating an informative data collection for automated deception detection: A corpus of truths and lies , 2012, ASIST.

[2]  D. Larcker,et al.  Detecting Deceptive Discussions in Conference Calls , 2012 .

[3]  Massimo Poesio,et al.  Automatic deception detection in Italian court cases , 2013, Artificial Intelligence and Law.

[4]  Verónica Pérez-Rosas,et al.  A Multimodal Dataset for Deception Detection , 2014, LREC.

[5]  Rachel Greenstadt,et al.  Practical Attacks Against Authorship Recognition Techniques , 2009, IAAI.

[6]  Verónica Pérez-Rosas,et al.  Experiments in Open Domain Deception Detection , 2015, EMNLP.

[7]  Andreas Stolcke,et al.  Distinguishing deceptive from non-deceptive speech , 2005, INTERSPEECH.

[8]  Walter Daelemans,et al.  CLiPS Stylometry Investigation (CSI) corpus: A Dutch corpus for the detection of age, gender, personality, sentiment and deception in text , 2014, LREC.

[9]  Eileen Fitzpatrick,et al.  Building a forensic corpus to test language-based indicators of deception , 2010 .

[10]  Eileen Fitzpatrick,et al.  Building a Data Collection for Deception Research , 2012 .

[11]  Hongye Tan,et al.  Deception Detection Based on SVM for Chinese Text in CMC , 2009, 2009 Sixth International Conference on Information Technology: New Generations.

[12]  J. Nunamaker,et al.  Automating Linguistics-Based Cues for Detecting Deception in Text-Based Asynchronous Computer-Mediated Communications , 2004 .

[13]  Rachel Greenstadt,et al.  Detecting Hoaxes, Frauds, and Deception in Writing Style Online , 2012, 2012 IEEE Symposium on Security and Privacy.

[14]  Christie M. Fuller,et al.  Exploration of Feature Selection and Advanced Classification Models for High-Stakes Deception Detection , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[15]  Tatiana Litvinova,et al.  Deception detection in Russian texts , 2017, EACL.

[16]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[17]  R. Valencia-García,et al.  Seeing through Deception: A Computational Approach to Deceit Detection in Written Communication , 2012 .

[18]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[19]  Mohamed Abouelenien,et al.  Deception Detection using Real-life Trial Data , 2015, ICMI.