#MeTooMA: Multi-Aspect Annotations of Tweets Related to the MeToo Movement

In this paper, we present a dataset containing 9,973 tweets related to the MeToo movement that were manually annotated for five different linguistic aspects: relevance, stance, hate speech, sarcasm, and dialogue acts. We present a detailed account of the data collection and annotation processes. The annotations have a very high inter-annotator agreement (0.79 to 0.93 k-alpha) due to the domain expertise of the annotators and clear annotation instructions. We analyze the data in terms of geographical distribution, label correlations, and keywords. Lastly, we present some potential use cases of this dataset. We expect this dataset would be of great interest to psycholinguists, socio-linguists, and computational linguists to study the discursive space of digitally mobilized social movements on sensitive issues like sexual harassment.

[1]  Melissa Mazmanian,et al.  Fostering Civil Discourse Online: Linguistic Behavior in Comments of #MeToo Articles Across Political Perspectives , 2020 .

[2]  Sima Sharifirad,et al.  Learning and Understanding Different Categories of Sexism Using Convolutional Neural Network’s Filters , 2019, WNLP@ACL.

[3]  Ramit Sawhney,et al.  #YouToo? Detection of Personal Recollections of Sexual Harassment on Social Media , 2019, ACL.

[4]  Ramit Sawhney,et al.  Speak up, Fight Back! Detection of Social Media Disclosures of Sexual Harassment , 2019, NAACL.

[5]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[6]  Janet K. L. McKeown,et al.  One Day of #Feminism: Twitter as a Complex Digital Arena for Wielding, Shielding, and Trolling talk on Feminism , 2019 .

[7]  Yulia Tsvetkov,et al.  Contextual Affective Analysis: A Case Study of People Portrayals in Online #MeToo Stories , 2019, ICWSM.

[8]  Alec R. Hosterman,et al.  Twitter, Social Support Messages, and the #MeToo Movement , 2018 .

[9]  Gloria Mark,et al.  Fostering Civil Discourse Online , 2018, Proc. ACM Hum. Comput. Interact..

[10]  Hemant Purohit,et al.  Distributional Semantics Approach to Detect Intent in Twitter Conversations on Sexual Assaults , 2018, 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[11]  Erik Cambria,et al.  International Conference on Advances in Social Networks Analysis and Mining ( ASONAM ) Sounds of Silence Breakers : Exploring Sexual Violence on Twitter , 2018 .

[12]  Saif Mohammad,et al.  Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words , 2018, ACL.

[13]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[14]  Parinaz Sobhani Stance Detection and Analysis in Social Media , 2017 .

[15]  Kalina Bontcheva,et al.  Stance Detection with Bidirectional Conditional Encoding , 2016, EMNLP.

[16]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[17]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[18]  David Griol,et al.  The Conversational Interface , 2016 .

[19]  Tatjana Scheffler,et al.  Dialog Act Annotation for Twitter Conversations , 2015, SIGDIAL Conference.

[20]  Cecilia Ovesdotter Alm,et al.  An Analysis of Domestic Abuse Discourse on Reddit , 2015, EMNLP.

[21]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[22]  David Bamman,et al.  Contextualized Sarcasm Detection on Twitter , 2015, ICWSM.

[23]  Reza Zafarani,et al.  Sarcasm Detection on Twitter: A Behavioral Modeling Approach , 2015, WSDM.

[24]  Nitin Agarwal,et al.  Bridging Women Rights Networks: Analyzing Interconnected Online Collective Actions , 2014, J. Glob. Inf. Manag..

[25]  Ragnhild Nordås,et al.  Sexual violence in armed conflict , 2014 .

[26]  Charles A Emlet,et al.  Creating a Vision for the Future: Key Competencies and Strategies for Culturally Competent Practice With Lesbian, Gay, Bisexual, and Transgender (LGBT) Older Adults in the Health and Human Services , 2014, Journal of gerontological social work.

[27]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[28]  Chris Hutchings,et al.  Commercial use of Facebook and Twitter – risks and rewards , 2012 .

[29]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[30]  Klaus Krippendorff,et al.  Computing Krippendorff's Alpha-Reliability , 2011 .

[31]  Timothy Baldwin,et al.  Classifying Dialogue Acts in One-on-One Live Chats , 2010, EMNLP.

[32]  Ron Artstein Inter-Coder Agreement for Computational Linguistics , 2008 .

[33]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..