Exploring high-level features for detecting cyberpedophilia

In this paper, we suggest a list of high-level features and study their applicability in detection of cyberpedophiles. We used a corpus of chats downloaded from http://www.perverted-justice.com and two negative datasets of different nature: cybersex logs available online, and the NPS chat corpus. The classification results show that the NPS data and the pedophiles' conversations can be accurately discriminated from each other with character n-grams, while in the more complicated case of cybersex logs there is need for high-level features to reach good accuracy levels. In this latter setting our results show that features that model behaviour and emotion significantly outperform the low-level ones, and achieve a 97% accuracy.

[1]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[2]  J. Pennebaker,et al.  The Secret Life of Pronouns , 2003, Psychological science.

[3]  Alexander Panchenko,et al.  Detection of Child Sexual Abuse Media on P2P Networks: Normalization and Classification of Associated Filenames , 2012 .

[4]  SolorioThamar,et al.  Exploring high-level features for detecting cyberpedophilia , 2014 .

[5]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[6]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[7]  Hugo Jair Escalante,et al.  A Two-step Approach for Effective Detection of Misbehaving Users in Chats ⋆ Notebook for PAN at CLEF 2012 , 2012 .

[8]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[9]  Kathleen McKeown,et al.  Improving Word Sense Disambiguation in Lexical Chaining , 2003, IJCAI.

[10]  Eric N. Forsyth Improving automated lexical and discourse analysis of online chat dialog , 2007 .

[11]  R. C. Hall,et al.  A profile of pedophilia: definition, characteristics of offenders, recidivism, treatment outcomes, and forensic issues. , 2007, Mayo Clinic proceedings.

[12]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[13]  J. Wolak,et al.  Escaping or connecting? Characteristics of youth who form close online relationships. , 2003, Journal of adolescence.

[14]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[17]  N. Pendar Toward Spotting the Pedophile Telling victim from predator in text chats , 2007 .

[18]  Vincent Egan,et al.  Perverted Justice: A Content Analysis of the Language Used by Offenders Detected Attempting to Solicit Children for Sex , 2011 .

[19]  H. Snyder,et al.  Sexual Assault of Young Children as Reported to Law Enforcement: Victim, Incident, and Offender Characteristics. A NIBRS Statistical Report. , 2000 .

[20]  April Kontostathis,et al.  Learning to Identify Internet Sexual Predation , 2011, Int. J. Electron. Commer..

[21]  Shlomo Argamon,et al.  Automatically profiling the author of an anonymous text , 2009, CACM.

[22]  Donna M. Vandiver,et al.  Offender and Victim Characteristics of Registered Female Sexual Offenders in Texas: A Proposed Typology of Female Sexual Offenders , 2004, Sexual abuse : a journal of research and treatment.

[23]  Walter Daelemans,et al.  Predicting age and gender in online social networks , 2011, SMUC '11.

[24]  Paolo Rosso,et al.  Modelling Fixated Discourse in Chats with Cyberpedophiles , 2012 .

[25]  Craig H. Martell,et al.  Lexical and Discourse Analysis of Online Chat Dialog , 2007, International Conference on Semantic Computing (ICSC 2007).

[26]  Hugo Jair Escalante,et al.  A Two-step Approach for Effective Detection of Misbehaving Users in Chats , 2012, CLEF.