Detecting Predatory Behaviour in Game Chats

While games are a popular social media for children, there is a real risk that these children are exposed to potential sexual assault. A number of studies have already addressed this issue, however, the data used in previous research did not properly represent the real chats found in multiplayer online games. To address this issue, we obtained real chat data from MovieStarPlanet, a massively multiplayer online game for children. The research described in this paper aimed to detect predatory behaviours in the chats using machine learning methods. In order to achieve a high accuracy on this task, extensive preprocessing was necessary. We describe three different strategies for data selection and preprocessing, and extensively compare the performance of different learning algorithms on the different datasets and features.

[1]  Peter Hohenhaus Elements of traditional and “reverse” purism in relation to computer-mediated communication , 2005 .

[2]  April Kontostathis,et al.  Learning to Identify Internet Sexual Predation , 2011, Int. J. Electron. Commer..

[3]  Sonia Livingstone,et al.  Risks and safety on the internet: the perspective of European children: full findings and policy implications from the EU Kids Online survey of 9-16 year olds and their parents in 25 countries , 2011 .

[4]  Hugo Jair Escalante,et al.  A Two-step Approach for Effective Detection of Misbehaving Users in Chats , 2012, CLEF.

[5]  Nick Pendar,et al.  Toward Spotting the Pedophile Telling victim from predator in text chats , 2007, International Conference on Semantic Computing (ICSC 2007).

[6]  Fabio Crestani,et al.  Overview of the International Sexual Predator Identification Competition at PAN-2012 , 2012, CLEF.

[7]  Cindy K. Chung,et al.  The development and psychometric properties of LIWC2007 , 2007 .

[8]  Paolo Rosso,et al.  On the Impact of Sentiment and Emotion Based Features in Detecting Online Sexual Predators , 2012, WASSA@ACL.

[9]  Graeme Hirst,et al.  Identifying Sexual Predators by SVM Classification with Lexical and Behavioral Features , 2012, CLEF.

[10]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[11]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[12]  David E. Losada,et al.  A Learning-Based Approach for the Identification of Sexual Predators in Chat Logs , 2012, CLEF.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  April Kontostathis,et al.  Text Mining and Cybercrime , 2010 .

[15]  Walter Daelemans,et al.  Conversation Level Constraints on Pedophile Detection in Chat Rooms , 2012, CLEF.

[16]  Gunnar Eriksson,et al.  Features for Modelling Characteristics of Conversations , 2012, CLEF.

[17]  L. Olson,et al.  Entrapping the Innocent: Toward a Theory of Child Sexual Predators’ Luring Communication , 2007 .

[18]  Yin Yan World Wide Web and the Formation of the Chinese and English "Internet Slang Union" , 2006 .

[19]  Kelly Reynolds,et al.  Identifying Predators Using ChatCoder 2.0 , 2012, CLEF.

[20]  José María Gómez Hidalgo,et al.  Combining Predation Heuristics and Chat-Like Features in Sexual Predator Identification , 2012, CLEF.

[21]  Walter Daelemans,et al.  Predicting age and gender in online social networks , 2011, SMUC '11.