Suicide Risk Prediction by Tracking Self-Harm Aspects in Tweets: NUS-IDS at the CLPsych 2021 Shared Task

We describe our system for identifying users at-risk for suicide based on their tweets developed for the CLPsych 2021 Shared Task. Based on research in mental health studies linking self-harm tendencies with suicide, in our system, we attempt to characterize self-harm aspects expressed in user tweets over a period of time. To this end, we design SHTM, a Self-Harm Topic Model that combines Latent Dirichlet Allocation with a self-harm dictionary for modeling daily tweets of users. Next, differences in moods and topics over time are captured as features to train a deep learning model for suicide prediction.

[1]  Viet-An Nguyen,et al.  Lexical and Hierarchical Topic Regression , 2013, NIPS.

[2]  Timothy Baldwin,et al.  On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online , 2012, COLING.

[3]  Martin D. Sykora,et al.  What about Mood Swings: Identifying Depression on Twitter with Temporal Measures of Emotions , 2018, WWW.

[4]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[5]  Fabio Crestani,et al.  eRisk 2020: Self-harm and Depression Challenges , 2020, ECIR.

[6]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[7]  Philip Resnik,et al.  Community-level Research on Suicidality Prediction in a Secure Environment: Overview of the CLPsych 2021 Shared Task , 2021, CLPSYCH.

[8]  Susan M Sawyer,et al.  Global patterns of mortality in young people: a systematic analysis of population health data , 2009, The Lancet.

[9]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[10]  G. Lewis,et al.  Adolescent self-harm and suicidal thoughts in the ALSPAC cohort: a self-report survey in England , 2012, BMC Psychiatry.

[11]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[12]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[13]  Fabio Crestani,et al.  Overview of eRisk 2020: Early Risk Prediction on the Internet , 2020, CLEF.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Finale Doshi-Velez,et al.  Prediction Focused Topic Models via Feature Selection , 2020, AISTATS.

[16]  P. Resnik,et al.  CLPsych 2019 Shared Task: Predicting the Degree of Suicide Risk in Reddit Posts , 2019, Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology.

[17]  Jian Pei,et al.  Detecting topic evolution in scientific literature: how can citations help? , 2009, CIKM.

[18]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[19]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[20]  Aijun An,et al.  Learning Emotion-enriched Word Representations , 2018, COLING.

[21]  C. Lee Giles,et al.  Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation , 2009, ECIR.

[22]  José Luís Oliveira,et al.  BioInfo@UAVR at eRisk 2020: on the Use of Psycholinguistics Features and Machine Learning for the Classification and Quantification of Mental Diseases , 2020, CLEF.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[25]  Sergey I. Nikolenko,et al.  Latent dirichlet allocation: stability and applications to studies of user-generated content , 2014, WebSci '14.

[26]  Xiaoli Li,et al.  EMNLP versus ACL: Analyzing NLP research over time , 2015, EMNLP.

[27]  Philip Resnik,et al.  Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings , 2018, CLPsych@NAACL-HTL.