Overview of the Fourth Social Media Mining for Health (SMM4H) Shared Tasks at ACL 2019

The number of users of social media continues to grow, with nearly half of adults worldwide and two-thirds of all American adults using social networking on a regular basis1. Advances in automated data processing and NLP present the possibility of utilizing this massive data source for biomedical and public health applications, if researchers address the methodological challenges unique to this media. We present the Social Media Mining for Health Shared Tasks collocated with the ACL at Florence in 2019, which address these challenges for health monitoring and surveillance, utilizing state of the art techniques for processing noisy, real-world, and substantially creative language expressions from social media users. For the fourth execution of this challenge, we proposed four different tasks. Task 1 asked participants to distinguish tweets reporting an adverse drug reaction (ADR) from those that do not. Task 2, a follow-up to Task 1, asked participants to identify the span of text in tweets reporting ADRs. Task 3 is an end-to-end task where the goal was to first detect tweets mentioning an ADR and then map the extracted colloquial mentions of ADRs in the tweets to their corresponding standard concept IDs in the MedDRA vocabulary. Finally, Task 4 asked participants to classify whether a tweet contains a personal mention of one’s health, a more general discussion of the health issue, or is an unrelated mention. A total of 34 teams from around the world registered and 19 teams from 12 countries submitted a system run. We summarize here the corpora for this challenge which are freely available at https://competitions.codalab. org/competitions/22521, and present an overview of the methods and the results of the competing systems. Pew Research Center. Social Media Fact Sheet. 2017. [Online]. Available: http://www.pewinternet.org/factsheet/social-media/

[1]  Graciela Gonzalez-Hernandez,et al.  Utilizing social media data for pharmacovigilance: A review , 2015, J. Biomed. Informatics.

[2]  Abeed Sarker,et al.  A corpus for mining drug-related knowledge from Twitter chatter: Language models and their utilities , 2016, Data in brief.

[3]  Marieke van Erp,et al.  Lessons learnt from the Named Entity rEcognition and Linking (NEEL) challenge series , 2017, Semantic Web.

[4]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[5]  Eugene Agichtein,et al.  Did You Really Just Have a Heart Attack?: Towards Robust Detection of Personal Health Mentions in Social Media , 2018, WWW.

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[8]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[9]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[10]  IdentIfyIng dIsease-related expressIons In revIews UsIng CondItIonal random fIelds , 2017 .

[11]  Mark Dredze,et al.  Examining Patterns of Influenza Vaccination in Social Media , 2017, AAAI Workshops.

[12]  Michael J. Paul,et al.  Overview of the Third Social Media Mining for Health (SMM4H) Shared Tasks at EMNLP 2018 , 2018, EMNLP 2018.

[13]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.

[14]  Berry de Bruijn,et al.  Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task , 2018, J. Am. Medical Informatics Assoc..

[15]  Michael J. Paul,et al.  Identifying Protective Health Behaviors on Twitter: Observational Study of Travel Advisories and Zika Virus , 2018, Journal of Medical Internet Research.