Adopting MaxEnt to Identification of Bullying Incidents in Social Networks

Bullying is a widespread problem in cyberspace and social networks. Therefore, in the recent years many studies have been dedicated to cyberbullying. Lack of appropriate dataset, due to variety of reasons, is one of the major obstacles faced in most studies. In this work we suggest that to overcome some of these barriers a model should be employed which is minimally affected by prevalence and small sample size. To this end we adopted the use of the Maximum Entropy method (MaxEnt) to identify the bully users in YouTube. The final results were compared with the commonly used methods. All models provided reasonable prediction of the bullying incidents. MaxEnt models had the highest discrimination capacity of bullying posts and the lowest sensitivity towards prevalence. We demonstrate that MaxEnt can be successfully adopted to cyberbullying studies with imbalanced datasets.

[1]  Dolf Trieschnigg,et al.  Expert knowledge for automatic detection of bullies in social networks , 2013 .

[2]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[3]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[4]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[5]  Brian D. Davison,et al.  Detection of Harassment on Web 2.0 , 2009 .

[6]  Xue Li,et al.  An Effective Approach for Cyberbullying Detection , 2013 .

[7]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[8]  D. Hosmer,et al.  A review of goodness of fit statistics for use in the development of logistic regression models. , 1982, American journal of epidemiology.

[9]  Dolf Trieschnigg,et al.  Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies , 2014, Canadian Conference on AI.

[10]  P. Legendre Spatial Autocorrelation: Trouble or New Paradigm? , 1993 .

[11]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[12]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Andrew K. Skidmore,et al.  Finessing atlas data for species distribution models , 2011 .

[15]  S. Ferrier,et al.  An evaluation of alternative algorithms for fitting species distribution models using logistic regression , 2000 .

[16]  J. Drake,et al.  Pattern‐recognition ecological niche models fit to presence‐only and presence–absence data , 2014 .

[17]  S L Hui,et al.  Validation techniques for logistic regression models. , 1991, Statistics in medicine.

[18]  Elizabeth A. Peck,et al.  Introduction to Linear Regression Analysis , 2001 .

[19]  Maral Dadvar,et al.  Experts and machines united against cyberbullying , 2014 .

[20]  Peter K. Smith,et al.  Cyberbullying: its nature and impact in secondary school pupils. , 2008, Journal of child psychology and psychiatry, and allied disciplines.