In intelligence, law enforcement, and, increasingly, organizational settings there is interest in detecting deception; for example, in intercepted phone calls, emails, and web sites. Humans are not naturally good at detecting deception, but recent work has shown that deception is actually readily detectable - using markers that humans don't see but which software can readily compute. Pennebaker's model suggests that deceptive communication is characterized by changes in the frequency of four kinds of words: first-person pronouns, exception words, negative emotion words, and action words.We investigate what can be learned about the deception model by applying it to a large corpus of Enron emails. We show that each of the four kinds of words in the Pennebaker model acts as a separate latent factor for deception, rather than having their effects mixed together.
[1]
J. Pennebaker,et al.
Lying Words: Predicting Deception from Linguistic Styles
,
2003,
Personality & social psychology bulletin.
[2]
Jay F. Nunamaker,et al.
An exploratory study into deception detection in text-based computer-mediated communication
,
2003,
36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.
[3]
David B. Skillicorn,et al.
Structure in the Enron Email Dataset
,
2005,
Comput. Math. Organ. Theory.
[4]
Jay F. Nunamaker,et al.
A Quasi-experiment to Determine the Impact of a Computer Based Deception Detection Training System: The Use of Agent99 Trainer in the U.S. Military
,
2005,
Proceedings of the 38th Annual Hawaii International Conference on System Sciences.