An investigation of data and text mining methods for real world deception detection

Uncovering lies (or deception) is of critical importance to many including law enforcement and security personnel. Though these people may try to use many different tactics to discover deception, previous research tells us that this cannot be accomplished successfully without aid. This manuscript reports on the promising results of a research study where data and text mining methods along with a sample of real-world data from a high-stakes situation is used to detect deception. At the end, the information fusion based classification models produced better than 74% classification accuracy on the holdout sample using a 10-fold cross validation methodology. Nonetheless, artificial neural networks and decision trees produced accuracy rates of 73.46% and 71.60% respectively. However, due to the high stakes associated with these types of decisions, the extra effort of combining the models to achieve higher accuracy is well warranted.

[1]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[2]  Jeffrey T. Hancock,et al.  Deception and design: the impact of communication technology on lying behavior , 2004, CHI.

[3]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[4]  J. Nunamaker,et al.  Automating Linguistics-Based Cues for Detecting Deception in Text-Based Asynchronous Computer-Mediated Communications , 2004 .

[5]  Steven A. Mccornack Information manipulation theory , 1992 .

[6]  J.F. Nunamaker,et al.  Detecting deception in secondary screening interviews using linguistic analysis , 2004, Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems (IEEE Cat. No.04TH8749).

[7]  Andrew H. Ryan,et al.  Comparison of polygraph data obtained from individuals involved in mock crimes and actual criminal investigations. , 2004, The Journal of applied psychology.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Florence L. Denmark,et al.  Social/Ecological Psychology and the Psychology of Women , 1986 .

[10]  Jeffrey T. Hancock,et al.  Automated Linguistic Analysis of Deceptive and Truthful Synchronous Computer-Mediated Communication , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[11]  Jay F. Nunamaker,et al.  Automated Determination of the Veracity of Interview Statements from People of Interest to an Operational Security Force , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[12]  A. Vrij Detecting Lies and Deceit: The Psychology of Lying and the Implications for Professional Practice , 2000 .

[13]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[14]  James J. Lindsay,et al.  Cues to deception. , 2003, Psychological bulletin.

[15]  Marcia K. Johnson,et al.  Reality Monitoring , 2005 .

[16]  Jay F. Nunamaker,et al.  Advances in automated deception detection in text-based computer-mediated communication , 2004, SPIE Defense + Commercial Sensing.

[17]  B. Depaulo,et al.  Accuracy of Deception Judgments , 2006, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[18]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[19]  A. McQuarrie,et al.  The Preliminary Credibility Assessment System Embedded Algorithm Description and Validation Results GED-R-06-7571 , 2006 .

[20]  J. Burgoon,et al.  Interpersonal Deception Theory , 1996 .

[21]  Limsoon Wong,et al.  DATA MINING TECHNIQUES , 2003 .

[22]  M. Lynn Hawaii International Conference on System Sciences , 1996 .

[23]  Jay F. Nunamaker,et al.  A Comparison of Classification Methods for Predicting Deception in Computer-Mediated Communication , 2004, J. Manag. Inf. Syst..

[24]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[25]  Thomas Hugh Feeley,et al.  To Catch a Liar: Challenges for Research in Lie Detection Training , 2003 .

[26]  Adrienne Y. Lee,et al.  Language of lies in prison: linguistic classification of prisoners' truthful and deceptive natural language , 2005 .