A Streaming Machine Learning Framework for Online Aggression Detection on Twitter.

The rise of online aggression on social media is evolving into a major point of concern. Several machine and deep learning approaches have been proposed recently for detecting various types of aggressive behavior. However, social media are fast paced, generating an increasing amount of content, while aggressive behavior evolves over time. In this work, we introduce the first, practical, real-time framework for detecting aggression on Twitter via embracing the streaming machine learning paradigm. Our method adapts its ML classifiers in an incremental fashion as it receives new annotated examples and is able to achieve the same (or even higher) performance as batch-based ML models, with over 90% accuracy, precision, and recall. At the same time, our experimental analysis on real Twitter data reveals how our framework can easily scale to accommodate the entire Twitter Firehose (of 778 million tweets per day) with only 3 commodity machines. Finally, we show that our framework is general enough to detect other related behaviors such as sarcasm, racism, and sexism in real time.

[1]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[2]  Pascale Fung,et al.  One-step and Two-step Classification for Abusive Language Detection on Twitter , 2017, ALW@ACL.

[3]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[4]  Kimberley R. Allison Social Norms in Online Communities: Formation, Evolution and Relation to Cyber-Aggression , 2018, CHI Extended Abstracts.

[5]  Shivakant Mishra,et al.  Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network , 2015, SocInfo.

[6]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[7]  Daniele Quercia,et al.  The Social World of Content Abusers in Community Question Answering , 2015, WWW.

[8]  Daphney-Stavroula Zois,et al.  Cyberbullying Ends Here: Towards Robust Detection of Cyberbullying in Social Media , 2019, WWW.

[9]  Peter K. Smith,et al.  Cyberbullying: its nature and impact in secondary school pupils. , 2008, Journal of child psychology and psychiatry, and allied disciplines.

[10]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[11]  Shivakant Mishra,et al.  Scalable and timely detection of cyberbullying in online social networks , 2018, SAC.

[12]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[13]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[14]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[15]  Gianluca Stringhini,et al.  Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and Its Effects on the Web , 2016, ICWSM.

[16]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[17]  João Gama,et al.  Machine learning for streaming data: state of the art, challenges, and opportunities , 2019, SKDD.

[18]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[19]  Christos Faloutsos,et al.  Retweeting Activity on Twitter: Signs of Deception , 2015, PAKDD.

[20]  Gianluca Stringhini,et al.  Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior , 2018, ICWSM.

[21]  Reza Zafarani,et al.  Sarcasm Detection on Twitter: A Behavioral Modeling Approach , 2015, WSDM.

[22]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[23]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[24]  Gianluca Stringhini,et al.  Detecting Cyberbullying and Cyberaggression in Social Media , 2019, ACM Trans. Web.

[25]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[26]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[27]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[28]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[29]  Jintae Lee,et al.  A holistic model of computer abuse within organizations , 2002, Inf. Manag. Comput. Secur..

[30]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.