Improving a textual deception detection model

In intelligence, law enforcement, and, increasingly, organizational settings there is interest in detecting deception; for example, in intercepted phone calls, emails, and web sites. Humans are not naturally good at detecting deception, but recent work has shown that deception is actually readily detectable - using markers that humans don't see but which software can readily compute. Pennebaker's model suggests that deceptive communication is characterized by changes in the frequency of four kinds of words: first-person pronouns, exception words, negative emotion words, and action words.We investigate what can be learned about the deception model by applying it to a large corpus of Enron emails. We show that each of the four kinds of words in the Pennebaker model acts as a separate latent factor for deception, rather than having their effects mixed together.

[1]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[2]  Jay F. Nunamaker,et al.  An exploratory study into deception detection in text-based computer-mediated communication , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[3]  David B. Skillicorn,et al.  Structure in the Enron Email Dataset , 2005, Comput. Math. Organ. Theory.

[4]  Jay F. Nunamaker,et al.  A Quasi-experiment to Determine the Impact of a Computer Based Deception Detection Training System: The Use of Agent99 Trainer in the U.S. Military , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.