Cyberbullying Detection based on text-stream classification

Current studies on cyberbullying detection, under text classification, mainly assume that the streaming text can be fully labelled. However, the exponential growth of unlabelled data in online content makes this assumption impractical. In this paper, we propose a session-based framework for automatic detection of cyberbullying from the huge amount of unlabelled streaming text. Given that the streaming data from Social Networks arrives in large volume at the server system, we incorporate an ensemble of one-class classifiers in the session-based framework. The proposed framework addresses the real world scenario, where only a small set of positive instances are available for initial training. Our main contribution in this paper is to automatically detect cyberbullying in real world situations, where labelled data is not readily available. Our early results show that the proposed approach is reasonably effective for the automatic detection of cyberbullying on Social Networks. The experiments indicate that the ensemble learner outperforms the single window and fixed window approaches, while learning is from positive and unlabelled data.

[1]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[2]  Brian D. Davison,et al.  Detection of Harassment on Web 2.0 , 2009 .

[3]  Chaoyi Pang,et al.  Sentiment Analysis for Effective Detection of Cyber Bullying , 2012, APWeb.

[4]  Marilyn A. Campbell,et al.  Cyber Bullying: An Old Problem in a New Guise? , 2005, Australian Journal of Guidance and Counselling.

[5]  Xue Li,et al.  An Effective Approach for Cyberbullying Detection , 2013 .

[6]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[7]  Xue Li,et al.  Classifying text streams by keywords using classifier ensemble , 2011, Data Knowl. Eng..

[8]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[9]  Jiawei Han,et al.  PEBL: Web page classification without negative examples , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Daniel Barbará,et al.  Mining relevant text from unlabelled documents , 2003, Third IEEE International Conference on Data Mining.

[11]  R. Ordelman,et al.  Improved cyberbullying detection using gender information , 2012 .

[12]  Yang Zhang,et al.  Building a Text Classifier by a Keyword and Unlabeled Documents , 2009, PAKDD.

[13]  Peter K. Smith,et al.  Cyberbullying: its nature and impact in secondary school pupils. , 2008, Journal of child psychology and psychiatry, and allied disciplines.

[14]  Philip S. Yu,et al.  Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[15]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[16]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[17]  Andrew McCallum,et al.  Text Classification by Bootstrapping with Keywords, EM and Shrinkage , 1999 .

[18]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[19]  Maria E. Orlowska,et al.  One-Class Classification of Text Streams with Concept Drift , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[20]  A. Rimpelä,et al.  Bullying at school--an indicator of adolescents at risk for mental disorders. , 2000, Journal of adolescence.

[21]  Justin W. Patchin,et al.  Bullying, Cyberbullying, and Suicide , 2010, Archives of suicide research : official journal of the International Academy for Suicide Research.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Michele L. Ybarra,et al.  Youth engaging in online harassment: associations with caregiver-child relationships, Internet use, and personal characteristics. , 2004, Journal of adolescence.

[24]  Xindong Wu,et al.  Dynamic classifier selection for effective mining from noisy data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[25]  Youngjoong Ko,et al.  Text classification from unlabeled documents with bootstrapping and feature projection techniques , 2009, Inf. Process. Manag..