Mining criminal networks from unstructured text documents

Digital data collected for forensics analysis often contain valuable information about the suspects’ social networks. However, most collected records are in the form of unstructured textual data, such as e-mails, chat messages, and text documents. An investigator often has to manually extract the useful information from the text and then enter the important pieces into a structured database for further investigation by using various criminal network analysis tools. Obviously, this information extraction process is tedious and errorprone. Moreover, the quality of the analysis varies by the experience and expertise of the investigator. In this paper, we propose a systematic method to discover criminal networks from a collection of text documents obtained from a suspect’s machine, extract useful information for investigation, and then visualize the suspect’s criminal network. Furthermore, we present a hypothesis generation approach to identify potential indirect relationships among the members in the identified networks. We evaluated the effectiveness and performance of the method on a real-life cybercrimine case and some other datasets. The proposed method, together with the implemented software tool, has received positive feedback from the digital forensics team of a law enforcement unit in Canada.

[1]  Hideaki Takeda,et al.  An integrated method for social network extraction , 2006, WWW '06.

[2]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[3]  Hsinchun Chen,et al.  Criminal network analysis and visualization , 2005, CACM.

[4]  Mitsuru Ishizuka,et al.  Ranking Companies on the Web Using Social Network Mining , 2009 .

[5]  Tobun Dorbin Ng,et al.  Terrorism and Crime Related Weblog Social Network: Link, Content Analysis and Information Visualization , 2007, 2007 IEEE Intelligence and Security Informatics.

[6]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[7]  Rohini K. Srihari,et al.  A Text Mining Model for Hypothesis Generation , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Amr M. Youssef,et al.  Towards discovering criminal communities from textual data , 2011, SAC '11.

[10]  Gang Wang,et al.  Crime data mining: a general framework and some examples , 2004, Computer.

[11]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[12]  Jeffrey E. F. Friedl Mastering Regular Expressions , 1997 .

[13]  Hongyuan Zha,et al.  Probabilistic models for discovering e-communities , 2006, WWW '06.

[14]  David B. Skillicorn,et al.  Novel information discovery for intelligence and counterterrorism , 2007, Decis. Support Syst..

[15]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.