Automatic extraction of informal topics from online suicidal ideation

BackgroundSuicide is an alarming public health problem accounting for a considerable number of deaths each year worldwide. Many more individuals contemplate suicide. Understanding the attributes, characteristics, and exposures correlated with suicide remains an urgent and significant problem. As social networking sites have become more common, users have adopted these sites to talk about intensely personal topics, among them their thoughts about suicide. Such data has previously been evaluated by analyzing the language features of social media posts and using factors derived by domain experts to identify at-risk users.ResultsIn this work, we automatically extract informal latent recurring topics of suicidal ideation found in social media posts. Our evaluation demonstrates that we are able to automatically reproduce many of the expertly determined risk factors for suicide. Moreover, we identify many informal latent topics related to suicide ideation such as concerns over health, work, self-image, and financial issues.ConclusionsThese informal topics topics can be more specific or more general. Some of our topics express meaningful ideas not contained in the risk factors and some risk factors do not have complimentary latent topics. In short, our analysis of the latent topics extracted from social media containing suicidal ideations suggests that users of these systems express ideas that are complementary to the topics defined by experts but differ in their scope, focus, and precision of language.

[1]  Glen A. Coppersmith,et al.  Quantifying Suicidal Ideation via Language Usage on Social Media , 2015 .

[2]  H. Christensen,et al.  Detecting suicidality on Twitter , 2015 .

[3]  Christophe Giraud-Carrier,et al.  Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality , 2016, JMIR mental health.

[4]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[5]  Aaron Smith,et al.  6% of online adults are reddit users , 2013 .

[6]  Jonathan Gemmell,et al.  Infusing Collaborative Recommenders with Distributed Representations , 2016, DLRS@RecSys.

[7]  Mark Dredze,et al.  Detecting Changes in Suicide Content Manifested in Social Media Following Celebrity Suicides , 2015, HT.

[8]  Mark Dredze,et al.  Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media , 2016, CHI.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  David R. Williams,et al.  Cross-national prevalence and risk factors for suicidal ideation, plans and attempts , 2008, British Journal of Psychiatry.

[11]  Michael D. Barnes,et al.  Tracking suicide risk factors through Twitter in the US. , 2014, Crisis.

[12]  Загоровская Ольга Владимировна,et al.  Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count , 2015 .

[13]  P. Lewinsohn,et al.  Psychosocial risk factors for future adolescent suicide attempts. , 1994, Journal of consulting and clinical psychology.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[16]  G. Arbanas Diagnostic and Statistical Manual of Mental Disorders (DSM-5) , 2015 .

[17]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Lior Wolf,et al.  Joint word2vec Networks for Bilingual Semantic Representations , 2014, Int. J. Comput. Linguistics Appl..

[20]  Maria Liakata,et al.  The language of mental health problems in social media , 2016, CLPsych@HLT-NAACL.

[21]  J. Pacheco,et al.  Suicide , 1968, Royal Institute of Philosophy Lectures.

[22]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[23]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[24]  Gilles Louppe,et al.  Independent consultant , 2013 .

[25]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[26]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[27]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[28]  J. Stoker,et al.  The Department of Health and Human Services. , 1999, Home healthcare nurse.

[29]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[30]  Glen Coppersmith,et al.  Exploratory Analysis of Social Media Prior to a Suicide Attempt , 2016, CLPsych@HLT-NAACL.