"Like Sheep Among Wolves": Characterizing Hateful Users on Twitter

Hateful speech in Online Social Networks (OSNs) is a key challenge for companies and governments, as it impacts users and advertisers, and as several countries have strict legislation against the practice. This has motivated work on detecting and characterizing the phenomenon in tweets, social media posts and comments. However, these approaches face several shortcomings due to the noisiness of OSN data, the sparsity of the phenomenon, and the subjectivity of the definition of hate speech. This works presents a user-centric view of hate speech, paving the way for better detection methods and understanding. We collect a Twitter dataset of $100,386$ users along with up to $200$ tweets from their timelines with a random-walk-based crawler on the retweet graph, and select a subsample of $4,972$ to be manually annotated as hateful or not through crowdsourcing. We examine the difference between user activity patterns, the content disseminated between hateful and normal users, and network centrality measurements in the sampled graph. Our results show that hateful users have more recent account creation dates, and more statuses, and followees per day. Additionally, they favorite more tweets, tweet in shorter intervals and are more central in the retweet network, contradicting the "lone wolf" stereotype often associated with such behavior. Hateful users are more negative, more profane, and use less words associated with topics such as hate, terrorism, violence and anger. We also identify similarities between hateful/normal users and their 1-neighborhood, suggesting strong homophily.

[1]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[2]  Wagner Meira,et al.  Antagonism Also Flows Through Retweets: The Impact of Out-of-Context Quotes in Opinion Polarization Analysis , 2017, ICWSM.

[3]  Lee Rainie,et al.  The future of free speech, trolls, anonymity and fake news online , 2017 .

[4]  Matthew O. Jackson,et al.  Naïve Learning in Social Networks and the Wisdom of Crowds , 2010 .

[5]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[6]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[7]  Ellen Riloff,et al.  Sarcasm as Contrast between a Positive Sentiment and Negative Situation , 2013, EMNLP.

[8]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[9]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.

[10]  Jiebo Luo,et al.  Detecting the Hate Code on Social Media , 2017, ICWSM.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Pete Burnap,et al.  Us and them: identifying cyber hate on Twitter across multiple protected characteristics , 2016, EPJ Data Science.

[13]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[14]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[15]  A. Sellars Defining Hate Speech , 2016 .

[16]  Njagi Dennis Gitari,et al.  A Lexicon-based Approach for Hate Speech Detection , 2015, MUE 2015.

[17]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[18]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[19]  Fabio Sabatini,et al.  Online Networks and Subjective Well‐Being , 2014, ArXiv.

[20]  Donald F. Towsley,et al.  Sampling directed graphs with random walks , 2012, 2012 Proceedings IEEE INFOCOM.

[21]  Michael S. Bernstein,et al.  Empath: Understanding Topic Signals in Large-Scale Text , 2016, CHI.

[22]  Phyllis B. Gerstenfeld,et al.  Hate Online: A Content Analysis of Extremist Internet Sites , 2003 .

[23]  Krishna P. Gummadi,et al.  Strength in Numbers: Robust Tamper Detection in Crowd Computations , 2015, COSN.

[24]  Yimin Chen,et al.  Misleading Online Content: Recognizing Clickbait as "False News" , 2015, WMDD@ICMI.

[25]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[26]  Fabrício Benevenuto,et al.  Analyzing the Targets of Hate in Online Social Media , 2016, ICWSM.

[27]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[28]  Eric Stein,et al.  History Against Free Speech: The New German Law Against the "Auschwitz" -- and Other -- "Lies" , 1986 .

[29]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[30]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[31]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[32]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[33]  Andrew Marlton Know your meme , 2013 .