A Simple Classifier for Detecting Online Child Grooming Conversation

The massive proliferation of social media has opened possibilities for the perpetrator conducting the crime of online child grooming. Because the pervasiveness of the problem scale, it may only be tamed effectively and efficiently by using an automatic grooming conversation detection system. The current study intends to address the issue by using Support Vector Machine and k-nearest neighbors’ classifiers. Besides, the study also proposes a low-computational cost classification method, which classifies a conversation using the number of the existing grooming conversation characteristics. All proposed methods are evaluated using 150 textual conversations of which 105 are grooming, and 45 are non-grooming. We identify that grooming conversations possess 17 features of grooming characteristics. The results suggest that the SVM and k-NN can identify grooming conversations at 98.6% and 97.8% of the level of accuracy. Meanwhile, the proposed simple method has 96.8% accuracy. The empirical study also suggests that two among the seventeen characteristics are insignificant for the classification.

[1]  Ioannis Mavridis,et al.  Utilizing document classification for grooming attack recognition , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[2]  U. Rajendra Acharya,et al.  Thermography Based Breast Cancer Detection Using Texture Features and Support Vector Machine , 2012, Journal of Medical Systems.

[3]  Julia Davidson,et al.  The European Online Grooming Project , 2010 .

[4]  Janis Wolak,et al.  Police Posing as Juveniles Online to Catch Sex Offenders: Is It Working? , 2005, Sexual abuse : a journal of research and treatment.

[5]  Suresh Manandhar,et al.  Detecting Predatory Behaviour from Online Textual Chats , 2012, MCSS.

[6]  Peter M. Briggs,et al.  An Exploratory Study of Internet-Initiated Sexual Offenses and the Chat Room Sex Offender: Has the Internet Enabled a New Typology of Sex Offender? , 2011, Sexual abuse : a journal of research and treatment.

[7]  Fergyanto E. Gunawan,et al.  Logistic Models for Classifying Online Grooming Conversation , 2015 .

[8]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[9]  Oguz Findik,et al.  A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine , 2010, Expert Syst. Appl..

[10]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[11]  April Kontostathis,et al.  Text Mining and Cybercrime , 2010 .

[12]  Shi-Jinn Horng,et al.  A novel intrusion detection system based on hierarchical clustering and support vector machines , 2011, Expert Syst. Appl..

[13]  L. Olson,et al.  Entrapping the Innocent: Toward a Theory of Child Sexual Predators’ Luring Communication , 2007 .

[14]  Pekka Santtila,et al.  The Effects of Using Identity Deception and Suggesting Secrecy on the Outcomes of Adult-Adult and Adult-Child or -Adolescent Online Sexual Interactions , 2014 .

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  April Kontostathis,et al.  Learning to Identify Internet Sexual Predation , 2011, Int. J. Electron. Commer..

[17]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[18]  A. Beech,et al.  A review of online grooming: Characteristics and concerns , 2013 .

[19]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[20]  Melissa A. Wollis Online Predation: A Linguistic Analysis of Online Predator Grooming , 2011 .