Lexical criminal identification for chatting corpus

This paper aims to identify lexical of criminal elements for chatting corpus, which involved suspect and victim conversation utterances. Lexical criminal identification requires three processes. The first is tokenization to automatically assign each lexical with a corresponding serial number in every suspect and victim utterance. The second is tagging the lexical with parts of speech to identify verbs and nouns in the utterances. The third is to identify and analyze the interrogative criminal construct to get the criminal evidence. The chatting corpus consists of 3,067 suspect and victim utterances with 16,278 words, collected from 9 criminal chatting cases. The results indicate that both verb and noun are the most important part of speech elements that represent the criminal constructs in chat utterances.

[1]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[2]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[3]  Elizabeth D. Mynatt,et al.  Interviewing over instant messaging , 2004, CHI EA '04.

[4]  Herbert Schildt Natural-language processing in C , 1987 .

[5]  Martin Nystrand,et al.  THE ROLE OF CONTEXT IN WRITTEN COMMUNICATION , 1983 .

[6]  Mark R. Freiermuth,et al.  Features of electronic synchronous communication : a comparative analysis of online chat, spoken and written texts , 2001 .

[7]  W. Chafe,et al.  Properties of spoken and written language. , 1987 .

[8]  Mike Dickson An examination into AOL Instant Messenger 5.5 contact identification , 2006, Digit. Investig..

[9]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[10]  Mike Dickson An examination into MSN Messenger 7.5 contact identification , 2006, Digit. Investig..

[11]  Mike Dickson An examination into Yahoo Messenger 7.0 contact identification , 2006, Digit. Investig..

[12]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[13]  Wouter S. van Dongen Forensic artefacts left by Windows Live Messenger 8.0 , 2007, Digit. Investig..

[14]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[15]  D. Stein,et al.  Chat and conversation: a case of transmedial stability? , 2004 .

[16]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[17]  Berkant Barla Cambazoglu,et al.  Chat mining: Predicting user and message attributes in computer-mediated communication , 2008, Inf. Process. Manag..

[18]  Craig H. Martell,et al.  Lexical and Discourse Analysis of Online Chat Dialog , 2007, International Conference on Semantic Computing (ICSC 2007).

[19]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.