Technical Mapping of the Grooming Anatomy Using Machine Learning Paradigms: An Information Security Approach

In the field of information security, there are several areas of study that are under development. Social engineering is one of them that addresses the multidisciplinary challenges of cyber security. Nowadays, the attacks associated with social engineering are diverse, including the so-called Advanced Persistent Threats (APTs). These have been the subject of numerous investigations; however, cybernetic attacks of similar nature as grooming have been excluded from these studies. In the last decade, various efforts have been made to understand the structure and approach of grooming from the field of computer science with the use of computational learning algorithms. Nevertheless, these studies are not aligned with information security. In this work, the study of grooming is formalized as a social engineering attack, contrasting its stages or phases with life cycles associated with APTs. To achieve this goal, we use a database of real cyber-pedophile chats; this information was refined and the Latent Dirichlet Allocation (LDA) topic modeling was applied to determine the stages of the attack. Once the number of stages was determined, we proceed to give them a linguistic context, and with the use of machine learning, a linear model was trained to obtain 97.6% of training accuracy. With these results, it was determined that the study of grooming could support research associated with social engineering and contribute to new fields of information security.

[1]  Heiko Paulheim,et al.  Semantic Web in data mining and knowledge discovery: A comprehensive survey , 2016, J. Web Semant..

[2]  Violeta Holmes,et al.  Agent-Mediated Information Exchange: Child Safety Online , 2009, 2009 International Conference on Management and Service Science.

[3]  Fergyanto E. Gunawan,et al.  Logistic Models for Classifying Online Grooming Conversation , 2015 .

[4]  C. Katz,et al.  Internet-related child sexual abuse: What children tell us in their testimonies , 2013 .

[5]  K. Durkin Misuse of the Internet by Pedophiles: Implications for Law Enforcement and Probation Practice , 2002 .

[6]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[7]  J. Wolak,et al.  Use of social networking sites in online sex crimes against minors: an examination of national incidence and means of utilization. , 2010, The Journal of adolescent health : official publication of the Society for Adolescent Medicine.

[8]  Richard Keith Wortley,et al.  Getting into the Script of Adult Child Sex Offenders and Mapping out Situational Prevention Measures , 2011 .

[9]  Ping Chen,et al.  A Study on Advanced Persistent Threats , 2014, Communications and Multimedia Security.

[10]  Jeffrey T. Hancock,et al.  Absence Makes the Communication Grow Fonder: Geographic Separation, Interpersonal Media, and Intimacy in Dating Relationships , 2013 .

[11]  Fergyanto E. Gunawan,et al.  Detecting online child grooming conversation , 2016, 2016 11th International Conference on Knowledge, Information and Creativity Support Systems (KICSS).

[12]  C. Widom,et al.  Long-term effects of child abuse and neglect on emotion processing in adulthood. , 2014, Child abuse & neglect.

[13]  George Loukas,et al.  You Are Probably Not the Weakest Link: Towards Practical Prediction of Susceptibility to Semantic Social Engineering Attacks , 2016, IEEE Access.

[14]  N. Pendar Toward Spotting the Pedophile Telling victim from predator in text chats , 2007 .

[15]  Hein S. Venter,et al.  Social engineering attack framework , 2014, 2014 Information Security for South Africa.

[16]  Janis Wolak,et al.  Growth and change in undercover online child exploitation investigations, 2000–2006 , 2010 .

[17]  Michael Woodworth,et al.  A linguistic analysis of grooming strategies of online child sex offenders: Implications for our understanding of predatory sexual behavior in an increasingly computer-mediated world. , 2015, Child abuse & neglect.

[18]  Eric Michael Hutchins,et al.  Intelligence-Driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains , 2010 .

[19]  Andrzej Bargiela,et al.  Probabilistic Topic Models for Learning Terminological Ontologies , 2010, IEEE Transactions on Knowledge and Data Engineering.

[20]  Harith Alani,et al.  Detecting Child Grooming Behaviour Patterns on Social Media , 2014, SocInfo.

[21]  Lee Gillam,et al.  "Our Little Secret": pinpointing potential predators , 2014, Security Informatics.

[22]  R. Ross Managing Information Security Risk: Organization, Mission, and Information System View | NIST , 2011 .

[23]  John Yearwood,et al.  Detection of child exploiting chats from a mixed chat dataset as a text classification task , 2011, ALTA.

[24]  Yuval Shavitt,et al.  Detecting Pedophile Activity in BitTorrent Networks , 2012, PAM.

[25]  R. C. Hall,et al.  A profile of pedophilia: definition, characteristics of offenders, recidivism, treatment outcomes, and forensic issues. , 2007, Mayo Clinic proceedings.

[26]  Adams Wai Kin Kong Tutorial-1: New criminal and victim identification methods for sexual offenses against women and children , 2015, 2015 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE).

[27]  George M. Mohay,et al.  A Framework for Improved Adolescent and Child Safety in MMOs , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[28]  Khalid Alfalqi,et al.  A Survey of Topic Modeling in Text Mining , 2015 .

[29]  Gonzalo Mariscal,et al.  A survey of data mining and knowledge discovery process models and methodologies , 2010, The Knowledge Engineering Review.

[30]  Vishal Shrivatava,et al.  Application of Data Mining – A Survey Paper , 2014 .

[31]  Paolo Rosso,et al.  Modelling Fixated Discourse in Chats with Cyberpedophiles , 2012 .

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  Pavol Zavarsky,et al.  Risk Mitigation Strategies for Mobile Wi-Fi Robot Toys from Online Pedophiles , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[34]  Steffen Ihlenfeldt,et al.  DMME: Data mining methodology for engineering applications – a holistic extension to the CRISP-DM model , 2019, Procedia CIRP.

[35]  Dijiang Huang,et al.  A Survey on Advanced Persistent Threats: Techniques, Solutions, Challenges, and Research Opportunities , 2019, IEEE Communications Surveys & Tutorials.

[36]  L. Olson,et al.  Entrapping the Innocent: Toward a Theory of Child Sexual Predators’ Luring Communication , 2007 .

[37]  Christoph Meinel,et al.  Advanced persistent threats: Behind the scenes , 2016, 2016 Annual Conference on Information Science and Systems (CISS).

[38]  Saurabh Pal,et al.  Mining Educational Data to Analyze Students' Performance , 2012, ArXiv.

[39]  Juan-Zi Li,et al.  Knowledge discovery through directed probabilistic topic models: a survey , 2010, Frontiers of Computer Science in China.

[40]  A. Beech,et al.  A review of online grooming: Characteristics and concerns , 2013 .

[41]  George M. Mohay,et al.  Challenges of automating the detection of paedophile activity on the Internet , 2005, First International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE'05).

[42]  April Kontostathis,et al.  Learning to Identify Internet Sexual Predation , 2011, Int. J. Electron. Commer..

[43]  Yue Lu,et al.  Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA , 2011, Information Retrieval.

[44]  Li Yingbo,et al.  Study and research of APT detection technology based on big data processing architecture , 2015, 2015 IEEE 5th International Conference on Electronics Information and Emergency Communication.

[45]  Mohamed Wahbi,et al.  Advanced Persistent Threat: New analysis driven by life cycle phases and their challenges , 2016, 2016 International Conference on Advanced Communication Systems and Information Security (ACOSIS).

[46]  Jonas Poelmans,et al.  Analyzing Chat Conversations of Pedophiles with Temporal Relational Semantic Systems , 2012, 2012 European Intelligence and Security Informatics Conference.

[47]  M. Tech,et al.  DATA MINING TECHNIQUES: A SURVEY PAPER , 2013 .

[48]  William L. Simon,et al.  The Art of Deception: Controlling the Human Element of Security , 2001 .

[49]  Paolo Rosso,et al.  On the Impact of Sentiment and Emotion Based Features in Detecting Online Sexual Predators , 2012, WASSA@ACL.

[50]  Hugo Jair Escalante,et al.  Early detection of deception and aggressiveness using profile-based representations , 2017, Expert Syst. Appl..

[51]  Paolo Rosso,et al.  Exploring high-level features for detecting cyberpedophilia , 2014, Comput. Speech Lang..

[52]  Janis Wolak,et al.  Understanding the decline in unwanted online sexual solicitations for U.S. youth 2000-2010: findings from three Youth Internet Safety Surveys. , 2013, Child abuse & neglect.

[53]  Catherine D. Marcum,et al.  Interpreting the Intentions of Internet Predators: An Examination of Online Predatory Behavior , 2007, Journal of child sexual abuse.