Supporting Law Enforcement in Digital Communities through Natural Language Analysis

Recent years have seen an explosion in the number and scale of digital communities (e.g. peer-to-peer file sharing systems, chat applications and social networking sites). Unfortunately, digital communities are host to significant criminal activity including copyright infringement, identity theft and child sexual abuse. Combating this growing level of crime is problematic due to the ever increasing scale of today's digital communities. This paper presents an approach to provide automated support for the detection of child sexual abuse related activities in digital communities. Specifically, we analyze the characteristics of child sexual abuse media distribution in P2P file sharing networks and carry out an exploratory study to show that corpus-based natural language analysis may be used to automate the detection of this activity. We then give an overview of how this approach can be extended to police chat and social networking communities.

[1]  Peter Sawyer,et al.  Shallow knowledge as an aid to deep understanding in early phase requirements engineering , 2005, IEEE Transactions on Software Engineering.

[2]  Paul Rayson,et al.  Comparing Corpora using Frequency Profiling , 2000, Proceedings of the workshop on Comparing corpora -.

[3]  Louise Ellison Cyberstalking : Tackling harassment on the Internet , 2003 .

[4]  K. Lee,et al.  On the Penetration of Business Networks by P2P File Sharing , 2007, Second International Conference on Internet Monitoring and Protection (ICIMP 2007).

[5]  Patrick Brennan,et al.  A Prototype for Authorship Attribution Studies , 2006, Lit. Linguistic Comput..

[6]  Mike Clark,et al.  Public sector information , 2006 .

[7]  Danny Hughes,et al.  Is Deviant Behaviour the Norm on P2P File-Sharing Networks? , 2006 .

[8]  Stephen Gibson,et al.  Peer-to-peer: is deviant behavior the norm on P2P file-sharing networks? , 2006, IEEE Distributed Systems Online.

[9]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[10]  S Doogan,et al.  14th BILETA Conference: "CYBERSPACE 1999: Crime, Criminal Justice and the Internet". , 2005 .

[11]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[12]  S. Johansson,et al.  Word Frequencies in British and American English , 1985 .

[13]  Hui Peng,et al.  A data mining approach based on Grey prediction model in web environment , 2006, 2006 Semantics, Knowledge and Grid, Second International Conference on.

[14]  G. Leech,et al.  Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus , 1997 .

[15]  Paul Edward Rayson,et al.  Matrix : a statistical method and software tool for linguistic analysis through corpus comparison , 2003 .

[16]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..