Automatic detection of deception in child-produced speech using syntactic complexity features

It is important that the testimony of children be admissible in court, especially given allegations of abuse. Unfortunately, children can be misled by interrogators or might offer false information, with dire consequences. In this work, we evaluate various parameterizations of five classifiers (including support vector machines, neural networks, and random forests) in deciphering truth from lies given transcripts of interviews with 198 victims of abuse between the ages of 4 and 7. These evaluations are performed using a novel set of syntactic features, including measures of complexity. Our results show that sentence length, the mean number of clauses per utterance, and the StajnerMitkov measure of complexity are highly informative syntactic features, that classification accuracy varies greatly by the age of the speaker, and that accuracy up to 91.7% can be achieved by support vector machines given a sufficient amount of data.

[1]  T. Lyon,et al.  Truth induction in young maltreated children: the effects of oath-taking and reassurance on true and false disclosures. , 2008, Child abuse & neglect.

[2]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[3]  Andreas Stolcke,et al.  Distinguishing deceptive from non-deceptive speech , 2005, INTERSPEECH.

[4]  Michael Lewis,et al.  Deception in 3-Year-Olds , 1989 .

[5]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[6]  L. Gillam,et al.  “I Don’t Know Where He is Not”: Does Deception Research yet Offer a Basis for Deception Detectives? , 2012 .

[7]  Detecting deceit via analyses of verbal and nonverbal behavior in children and adults , 2004 .

[8]  James J. Lindsay,et al.  Cues to deception. , 2003, Psychological bulletin.

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[11]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[12]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[13]  David B. Skillicorn,et al.  Detecting deception in testimony , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[14]  U. Undeutsch,et al.  Courtroom evaluation of eyewitness testimony. , 1984 .

[15]  K. W. Hunt Grammatical structures written at three grade levels , 1965 .

[16]  Antonella Sorace,et al.  Children’s early acquisition of the passive: evidence from syntactic priming , 2008 .

[17]  Steven Pinker,et al.  Productivity and constraints in the acquisition of the passive , 1987, Cognition.

[18]  Graeme Hirst,et al.  Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists , 2011, Lit. Linguistic Comput..

[19]  V. Talwar,et al.  Coaching, truth induction, and young maltreated children's false allegations and false denials. , 2008, Child development.

[20]  J. Gross,et al.  PERSONALITY PROCESSES AND INDIVIDUAL DIFFERENCES Emotion Regulation and Memory: The Cognitive Costs of Keeping One's Cool , 2004 .

[21]  R. O'donnell,et al.  A Transformational Analysis of Oral and Written Grammatical Structures in the Language of Children in Grades Three, Five, and Seven , 1967 .

[22]  Sanja Stajner,et al.  Diachronic Changes in Text Complexity in 20th Century English Language: An NLP Approach , 2012, LREC.

[23]  Yi Zheng,et al.  Hedge Classification with Syntactic Dependency Features Based on an Ensemble Classifier , 2010, CoNLL Shared Task.

[24]  Steven Pinker,et al.  Language learnability and language development , 1985 .

[25]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[26]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[27]  Timo Järvinen,et al.  A non-projective dependency parser , 1997, ANLP.

[28]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[29]  Massimo Poesio,et al.  On the Use of Homogenous Sets of Subjects in Deceptive Language Analysis , 2012 .

[30]  R. Valencia-García,et al.  Seeing through Deception: A Computational Approach to Deceit Detection in Written Communication , 2012 .

[31]  Valerie Hauch,et al.  Linguistic Cues to Deception Assessed by Computer Programs: A Meta-Analysis , 2012 .

[32]  Andreas Stolcke,et al.  Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[33]  Michael Lewis,et al.  Deception in 3-Year-Olds , 1989 .

[34]  Kan Deng,et al.  Omega: on-line memory-based general purpose system classifier , 1999 .

[35]  James J. Gross,et al.  Composure at Any Cost? The Cognitive Consequences of Emotion Suppression , 1999 .