Detecting Predatory Behavior in Game Chats

While games are a popular social media for children, there is a real risk that these children are exposed to potential sexual assault. A number of studies have already addressed this issue, however, the data used in previous research did not properly represent the real chats found in multiplayer online games. To address this issue, we obtained real chat data from MovieStarPlanet, a massively multiplayer online game for children. The research described in this paper aimed to detect predatory behaviors in the chats using machine learning methods. In order to achieve a high accuracy on this task, extensive preprocessing was necessary. We describe three different strategies for data selection and preprocessing, and extensively compare the performance of different learning algorithms on the different data sets and features.

[1]  April Kontostathis,et al.  Learning to Identify Internet Sexual Predation , 2011, Int. J. Electron. Commer..

[2]  Fabio Crestani,et al.  Overview of the International Sexual Predator Identification Competition at PAN-2012 , 2012, CLEF.

[3]  Hugo Jair Escalante,et al.  A Two-step Approach for Effective Detection of Misbehaving Users in Chats , 2012, CLEF.

[4]  Nick Pendar,et al.  Toward Spotting the Pedophile Telling victim from predator in text chats , 2007, International Conference on Semantic Computing (ICSC 2007).

[5]  Sonia Livingstone,et al.  Risks and safety on the internet: the perspective of European children: full findings and policy implications from the EU Kids Online survey of 9-16 year olds and their parents in 25 countries , 2011 .

[6]  Cindy K. Chung,et al.  The development and psychometric properties of LIWC2007 , 2007 .

[7]  Peter Hohenhaus Elements of traditional and “reverse” purism in relation to computer-mediated communication , 2005 .

[8]  Walter Daelemans,et al.  Conversation Level Constraints on Pedophile Detection in Chat Rooms , 2012, CLEF.

[9]  Gunnar Eriksson,et al.  Features for Modelling Characteristics of Conversations , 2012, CLEF.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  L. Olson,et al.  Entrapping the Innocent: Toward a Theory of Child Sexual Predators’ Luring Communication , 2007 .

[12]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[13]  Graeme Hirst,et al.  Identifying Sexual Predators by SVM Classification with Lexical and Behavioral Features , 2012, CLEF.

[14]  Osma Suominen,et al.  Elements of a National SemanticWeb Infrastructure--Case Study Finland on the Semantic Web , 2007 .

[15]  Walter Daelemans,et al.  Predicting age and gender in online social networks , 2011, SMUC '11.

[16]  David E. Losada,et al.  A Learning-Based Approach for the Identification of Sexual Predators in Chat Logs , 2012, CLEF.

[17]  April Kontostathis,et al.  Text Mining and Cybercrime , 2010 .

[18]  Paolo Rosso,et al.  On the Impact of Sentiment and Emotion Based Features in Detecting Online Sexual Predators , 2012, WASSA@ACL.

[19]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[20]  Yin Yan World Wide Web and the Formation of the Chinese and English "Internet Slang Union" , 2006 .

[21]  Kelly Reynolds,et al.  Identifying Predators Using ChatCoder 2.0 , 2012, CLEF.

[22]  José María Gómez Hidalgo,et al.  Combining Predation Heuristics and Chat-Like Features in Sexual Predator Identification , 2012, CLEF.