Community-level Research on Suicidality Prediction in a Secure Environment: Overview of the CLPsych 2021 Shared Task

Progress on NLP for mental health — indeed, for healthcare in general — is hampered by obstacles to shared, community-level access to relevant data. We report on what is, to our knowledge, the first attempt to address this problem in mental health by conducting a shared task using sensitive data in a secure data enclave. Participating teams received access to Twitter posts donated for research, including data from users with and without suicide attempts, and did all work with the dataset entirely within a secure computational environment. We discuss the task, team results, and lessons learned to set the stage for future tasks on sensitive or confidential data.

[1]  Lawrence R. Rabiner,et al.  Automatic Speech Recognition - A Brief History of the Technology Development , 2004 .

[2]  Alex B. Fine,et al.  Natural Language Processing of Social Media as Screening for Suicide Risk , 2018, Biomedical informatics insights.

[3]  P. Resnik,et al.  CLPsych 2019 Shared Task: Predicting the Degree of Suicide Risk in Reddit Posts , 2019, Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology.

[4]  P. Resnik,et al.  A direct comparison of theory-driven and machine learning prediction of suicide: A meta-analysis , 2021, PloS one.

[5]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[6]  F. Ritchie The ‘Five Safes’: A framework for planning, designing and evaluating data access solutions , 2017 .

[7]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[8]  T. Yarkoni,et al.  Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning , 2017, Perspectives on psychological science : a journal of the Association for Psychological Science.

[9]  Nazli Goharian,et al.  Depression and Self-Harm Risk Assessment in Online Forums , 2017, EMNLP.

[10]  Evan M. Kleiman,et al.  Risk Factors for Suicidal Thoughts and Behaviors: A Meta-Analysis of 50 Years of Research , 2017, Psychological bulletin.

[11]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[12]  P. Resnik,et al.  Naturally occurring language as a source of evidence in suicide prevention. , 2020, Suicide & life-threatening behavior.

[13]  Julia Lane,et al.  Balancing access to health data and privacy: a review of the issues and approaches for the future. , 2010, Health services research.

[14]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Glen Coppersmith,et al.  Exploratory Analysis of Social Media Prior to a Suicide Attempt , 2016, CLPsych@HLT-NAACL.

[17]  Munmun De Choudhury,et al.  A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media , 2019, FAT.

[18]  Mark Dredze,et al.  Shared Task : Depression and PTSD on Twitter , 2015 .

[19]  Philip Resnik,et al.  Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings , 2018, CLPsych@NAACL-HTL.

[20]  See-Kiong Ng,et al.  Suicide Risk Prediction by Tracking Self-Harm Aspects in Tweets: NUS-IDS at the CLPsych 2021 Shared Task , 2021, CLPSYCH.

[21]  J. Naslund,et al.  Social Media and Mental Health: Benefits, Risks, and Opportunities for Research and Practice , 2020, Journal of Technology in Behavioral Science.

[22]  Almog Simchon,et al.  Using Psychologically-Informed Priors for Suicide Prediction in the CLPsych 2021 Shared Task , 2021, CLPSYCH.

[23]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[24]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[25]  E. Horvitz,et al.  Data, privacy, and the greater good , 2015, Science.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Kriti Kohli,et al.  Team 9: A Comparison of Simple vs. Complex Models for Suicide Risk Assessment , 2021, CLPSYCH.

[28]  Determining a Person’s Suicide Risk by Voting on the Short-Term History of Tweets for the CLPsych 2021 Shared Task , 2021, CLPSYCH.

[29]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[30]  D. Asch,et al.  Facebook language predicts depression in medical records , 2018, Proceedings of the National Academy of Sciences.

[31]  K. P. Subbalakshmi,et al.  Learning Models for Suicide Prediction from Social Media Posts , 2021, CLPSYCH.

[32]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[33]  C. Depp,et al.  Artificial Intelligence for Mental Healthcare: Clinical Applications, Barriers, Facilitators, and Artificial Wisdom. , 2021, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[34]  Munmun De Choudhury,et al.  Methodological Gaps in Predicting Mental Health States from Social Media: Triangulating Diagnostic Signals , 2019, CHI.

[35]  Munmun De Choudhury,et al.  Mental Health Discourse on reddit: Self-Disclosure, Social Support, and Anonymity , 2014, ICWSM.

[36]  Bart Desmet,et al.  SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions , 2018, COLING.

[37]  Phillip Wolff,et al.  Predicting future mental illness from social media: A big-data approach , 2019, Behavior research methods.

[38]  Mark Dredze,et al.  Ethical Research Protocols for Social Media Health Research , 2017, EthNLP@EACL.

[39]  Mark Dredze,et al.  Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media , 2016, CHI.