A Two-step Approach for Effective Detection of Misbehaving Users in Chats

This paper describes the system jointly developed by the Language Technologies Lab from INAOE and the Language and Reasoning Group from UAM for the Sexual Predators Identification task at the PAN 2012. The presented system focuses on the problem of identifying sexual predators in a set of suspicious chatting. It is mainly based on the following hypotheses: (i) terms used in the process of child exploitation are categorically and psychologically different than terms used in general chatting; and (ii) predators usually apply the same course of conduct pattern when they are approaching a child. Based on these hypotheses, our participation at the PAN 2012 aimed to demonstrate that it is possible to train a classifier to learn those particular terms that turn a chat conversation into a case of online child exploitation; and, that it is also possible to learn the behavioral patters of predators during a chat conversation allowing us to accurately distinguish victims from predators.