DecOp: A Multilingual and Multi-domain Corpus For Detecting Deception In Typed Text

In recent years, the increasing interest in the development of automatic approaches for unmasking deception in online sources led to promising results. Nonetheless, among the others, two major issues remain still unsolved: the stability of classifiers performances across different domains and languages. Tackling these issues is challenging since labelled corpora involving multiple domains and compiled in more than one language are few in the scientific literature. For filling this gap, in this paper we introduce DecOp (Deceptive Opinions), a new language resource developed for automatic deception detection in cross-domain and cross-language scenarios. DecOp is composed of 5000 examples of both truthful and deceitful first-person opinions balanced both across five different domains and two languages and, to the best of our knowledge, is the largest corpus allowing cross-domain and cross-language comparisons in deceit detection tasks. In this paper, we describe the collection procedure of the DecOp corpus and his main characteristics. Moreover, the human performance on the DecOp test-set and preliminary experiments by means of machine learning models based on Transformer architecture are shown.

[1]  Alexander F. Gelbukh,et al.  Cross-domain deception detection using support vector networks , 2016, Soft Computing.

[2]  M. Zuckerman Verbal and nonverbal communication of deception , 1981 .

[3]  B. Depaulo,et al.  Accuracy of Deception Judgments , 2006, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[4]  T. Levine Truth-Default Theory (TDT) , 2014 .

[5]  Arjun Mukherjee,et al.  Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns , 2015, ICWSM.

[6]  David Matsumoto,et al.  Ethnic Similarities and Differences in Linguistic Indicators of Veracity and Lying in a Moderately High Stakes Scenario , 2015 .

[7]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[8]  James H. Jones,et al.  Detection of Abusive Accounts with Arabic Tweets , 2022 .

[9]  antonio J. decicco,et al.  grammaTIcal dIfferences BeTween TruThful and decepTIVe narraTIVes , 2015 .

[10]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[11]  Bruno Verschuere,et al.  Using Named Entities for Computer‐Automated Verbal Deception Detection , 2017, Journal of forensic sciences.

[12]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[13]  Frank Rudzicz,et al.  Automatic detection of deception in child-produced speech using syntactic complexity features , 2013, ACL.

[14]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[15]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[16]  Panagiotis G. Ipeirotis,et al.  Demographics and Dynamics of Mechanical Turk Workers , 2018, WSDM.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Massimo Poesio,et al.  Automatic deception detection in Italian court cases , 2013, Artificial Intelligence and Law.

[19]  Richard Watson Todd,et al.  Differences in Language Used by Deceivers and Truth-Tellers in Thai Online Chat , 2017 .

[20]  Verónica Pérez-Rosas,et al.  Cross-cultural Deception Detection , 2014, ACL.

[21]  Massimo Poesio,et al.  Identifying fake Amazon reviews as learning from crowds , 2014, EACL.

[22]  Timothy R. Levine,et al.  Accuracy in detecting truths and lies: Documenting the “veracity effect” , 1999 .

[23]  Manfred Stede,et al.  Classifying news versus opinions in newspapers: Linguistic features for domain independence , 2017, Natural Language Engineering.

[24]  Joanne Arciuli,et al.  Markers of Deception in Italian Speech , 2012, Front. Psychology.

[25]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[26]  Yoong Keok Lee,et al.  Book Review: Automatic Detection of Verbal Deception by Eileen Fitzpatrick, Joan Bachenko and Tommaso Fornaciari , 2015, CL.

[27]  Bruno Verschuere,et al.  An Investigation on the Detectability of Deceptive Intent about Flying through Verbal Deception Detection , 2017 .

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  David Matsumoto,et al.  Cross-Language Applicability of Linguistic Features Associated with Veracity and Deception , 2015 .

[30]  Galit Nahari,et al.  Cross‐cultural verbal deception , 2018, Legal and Criminological Psychology.

[31]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[32]  Jay F. Nunamaker,et al.  Establishing a foundation for automated human credibility screening , 2012, 2012 IEEE International Conference on Intelligence and Security Informatics.

[33]  Alessandro Moschitti,et al.  TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection , 2019, AAAI.

[34]  Claire Cardie,et al.  Negative Deceptive Opinion Spam , 2013, NAACL.

[35]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[36]  S. L. Sporer,et al.  Are Computers Effective Lie Detectors? A Meta-Analysis of Linguistic Cues to Deception , 2015, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[37]  David Matsumoto,et al.  Differences in Word Usage by Truth Tellers and Liars in Written Statements and an Investigative Interview After a Mock Crime , 2015 .

[38]  Julia Hirschberg,et al.  Linguistic Cues to Deception and Perceived Deception in Interview Dialogues , 2018, NAACL.

[39]  James Caverlee,et al.  Online Deception Detection Refueled by Real World Data Collection , 2017, RANLP.

[40]  Mohamed Abouelenien,et al.  Deception Detection using Real-life Trial Data , 2015, ICMI.

[41]  Verónica Pérez-Rosas,et al.  Automatic Detection of Fake News , 2017, COLING.