Analysis of Online Suicide Risk with Document Embeddings and Latent Dirichlet Allocation

Machine learning to infer suicide risk and urgency is applied to a dataset of Reddit users in which the risk and urgency labels were derived from crowdsource consensus. We present the results of machine learning models based on transfer learning from document embeddings trained on large external corpora, and find that they have very high F1 scores (.83 -. 92) in distinguishing which users are labeled as being most at risk of committing suicide. We further show that the document embedding approach outperforms a method based on word importance, where important words were identified by domain experts. Finally, we find, using a Latent Dirichlet Allocation (LDA) topic model, that users labeled at-risk for suicide post about different topics to the rest of Reddit than non-suicidal users.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[3]  A. Leenaars,et al.  Suicide Note Classification Using Natural Language Processing: A Content Analysis , 2010, Biomedical informatics insights.

[4]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[5]  L. Flashman,et al.  Predicting the Risk of Suicide by Analyzing the Text of Clinical Notes , 2014, PloS one.

[6]  Lei Zhang,et al.  Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users , 2014, HCC.

[7]  Michael D. Barnes,et al.  Tracking suicide risk factors through Twitter in the US. , 2014, Crisis.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  M. Fava,et al.  Feelings of worthlessness, traumatic experience, and their comorbidity in relation to lifetime suicide attempt in community adults with major depressive disorder. , 2014, Journal of affective disorders.

[10]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[11]  Tracy K. Witte,et al.  College Students' Responses to Suicidal Content on Social Networking Sites: An Examination Using a Simulated Facebook Newsfeed. , 2016, Suicide & life-threatening behavior.

[12]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[13]  Sara Reardon,et al.  AI algorithms to prevent suicide gain traction , 2017 .

[14]  Colin G. Walsh,et al.  Predicting Risk of Suicide Attempts Over Time Through Machine Learning , 2017 .

[15]  Evan M. Kleiman,et al.  Risk Factors for Suicidal Thoughts and Behaviors: A Meta-Analysis of 50 Years of Research , 2017, Psychological bulletin.

[16]  Philip Resnik,et al.  Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings , 2018, CLPsych@NAACL-HTL.

[17]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[18]  T. Joiner,et al.  Increases in Depressive Symptoms, Suicide-Related Outcomes, and Suicide Rates Among U.S. Adolescents After 2010 and Links to Increased New Media Screen Time , 2018 .

[19]  Bethany A. Teachman,et al.  Identification of Imminent Suicide Risk Among Young Adults using Text Messages , 2018, CHI.

[20]  Alex B. Fine,et al.  Natural Language Processing of Social Media as Screening for Suicide Risk , 2018, Biomedical informatics insights.

[21]  G. Antoniou,et al.  Toward Automatic Risk Assessment to Support Suicide Prevention. , 2019, Crisis.

[22]  Carla Agurto,et al.  Predictive Linguistic Markers of Suicidality in Poets , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[23]  P. Resnik,et al.  CLPsych 2019 Shared Task: Predicting the Degree of Suicide Risk in Reddit Posts , 2019, Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology.

[24]  Munmun De Choudhury,et al.  Methodological Gaps in Predicting Mental Health States from Social Media: Triangulating Diagnostic Signals , 2019, CHI.

[25]  Catherine M McHugh,et al.  Association between suicidal ideation and suicide: meta-analyses of odds ratios, sensitivity, specificity and positive predictive value , 2019, BJPsych Open.

[26]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[27]  Elham Mohammadi,et al.  CLaC at CLPsych 2019: Fusion of Neural Features and Predicted Class Probabilities for Suicide Risk Assessment Based on Online Posts , 2019, Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology.