Mining Offensive Language on Social Media

English. The present research deals with the automatic annotation and classification of vulgar ad offensive speech on social media. In this paper we will test the effectiveness of the computational treatment of the taboo contents shared on the web, the output is a corpus of 31,749 Facebook comments which has been automatically annotated through a lexicon-based method for the automatic identification and classification of taboo expressions. Italiano. La presente ricerca affronta il tema dell’annotazione e della classificazione automatica dei contenuti volgari e offensivi espressi nei social media. Lo scopo del nostro lavoro consiste nel testare l’efficacia del trattamento computazionale dei contenuti tabù condivisi in rete. L’output che forniamo un corpus di 31,749 commenti generati dagli utenti di Facebook e annotato automaticamente attraverso un metodo basato sul lessico per l’identificazione e la classificazione delle

[1]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[2]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[3]  Zhi Xu,et al.  Filtering Offensive Language in Online Communities using Grammatical Relations , 2010 .

[4]  Kelly Reynolds,et al.  Using Machine Learning to Detect Cyberbullying , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[5]  Simonetta Vietri,et al.  Idiomatic Constructions in Italian: A Lexicon-Grammar approach , 2014 .

[6]  Serena Pelosi,et al.  SentIta and Doxa: Italian Databases and Tools for Sentiment Analysis Purposes , 2015 .

[7]  Mumit Khan,et al.  Detecting flames and insults in text , 2008 .

[8]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[9]  Carolyn Penstein Rosé,et al.  Detecting offensive tweets via topical feature discovery over a large scale twitter corpus , 2012, CIKM.

[10]  Alessandro Maisto,et al.  A Lexicon-Based Approach to Sentiment Analysis. , 2014 .

[11]  Stan Matwin,et al.  Offensive Language Detection Using Multi-level Classification , 2010, Canadian Conference on AI.

[12]  Brian D. Davison,et al.  Detection of Harassment on Web 2.0 , 2009 .

[13]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.