Quantifying Mental Health from Social Media with Neural User Embeddings

Mental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement these survey methods with broad, aggregated information derived from social media content provides a potential means for near real-time estimates at scale. These may, in turn, provide grist for supporting, evaluating and iteratively improving upon public health programs and interventions. We propose a novel model for automated mental health status quantification that incorporates user embeddings. This builds upon recent work exploring representation learning methods that induce embeddings by leveraging social media post histories. Such embeddings capture latent characteristics of individuals (e.g., political leanings) and encode a soft notion of homophily. In this paper, we investigate whether user embeddings learned from twitter post histories encode information that correlates with mental health statuses. To this end, we estimated user embeddings for a set of users known to be affected by depression and post-traumatic stress disorder (PTSD), and for a set of demographically matched `control' users. We then evaluated these embeddings with respect to: (i) their ability to capture homophilic relations with respect to mental health status; and (ii) the performance of downstream mental health prediction models based on these features. Our experimental results demonstrate that the user embeddings capture similarities between users with respect to mental conditions, and are predictive of mental health.

[1]  T. Vos,et al.  Global burden of disease attributable to mental and substance use disorders: findings from the Global Burden of Disease Study 2010 , 2013, The Lancet.

[2]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[3]  Byron C. Wallace,et al.  Modelling Context with User Embeddings for Sarcasm Detection in Social Media , 2016, CoNLL.

[4]  Mark Dredze,et al.  From ADHD to SAD: Analyzing the Language of Mental Health on Twitter through Self-Reported Diagnoses , 2015, CLPsych@HLT-NAACL.

[5]  Reza Zafarani,et al.  Sarcasm Detection on Twitter: A Behavioral Modeling Approach , 2015, WSDM.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Xiaojun Wan,et al.  User Embedding for Scholarly Microblog Recommendation , 2016, ACL.

[8]  Загоровская Ольга Владимировна,et al.  Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count , 2015 .

[9]  Ting Liu,et al.  Learning Semantic Representations of Users and Products for Document Level Sentiment Classification , 2015, ACL.

[10]  Ramón Fernández Astudillo,et al.  Learning Word Representations from Scarce and Noisy Data with Embedding Subspaces , 2015, ACL.

[11]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[12]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[13]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[14]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[15]  Current depression among adults---United States, 2006 and 2008. , 2010, MMWR. Morbidity and mortality weekly report.

[16]  Yi Yang,et al.  Toward Socially-Infused Information Extraction: Embedding Authors, Mentions, and Entities , 2016, EMNLP.

[17]  David Bamman,et al.  Contextualized Sarcasm Detection on Twitter , 2015, ICWSM.

[18]  Mark Dredze,et al.  Ethical Research Protocols for Social Media Health Research , 2017, EthNLP@EACL.

[19]  Mark Dredze,et al.  Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media , 2016, CHI.

[20]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[21]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[22]  Mark Dredze,et al.  Measuring Post Traumatic Stress Disorder in Twitter , 2014, ICWSM.

[23]  R G Priest,et al.  The Defeat Depression Campaign: psychiatry in the public arena. , 1997, The American journal of psychiatry.

[24]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[25]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[26]  Daniel Jurafsky,et al.  Learning multi-faceted representations of individuals from heterogeneous evidence using neural networks , 2015, ArXiv.

[27]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[28]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[29]  Ye Zhang,et al.  A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification , 2015, IJCNLP.

[30]  Glen Coppersmith,et al.  Exploratory Analysis of Social Media Prior to a Suicide Attempt , 2016, CLPsych@HLT-NAACL.

[31]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[32]  Maarten Sap,et al.  Mental Illness Detection at the World Well-Being Project for the CLPsych 2015 Shared Task , 2015, CLPsych@HLT-NAACL.

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[35]  Leonardo Max Batista Claudino,et al.  Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter , 2015, CLPsych@HLT-NAACL.

[36]  Mike Conway,et al.  Ethical issues in using Twitter for population-level depression monitoring: a qualitative study , 2016, BMC Medical Ethics.

[37]  Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, CLPsych@ACL 2014, Baltimore, Maryland, USA, June 27, 2004 , 2014, CLPsych@ACL.

[38]  Ted Pedersen,et al.  Screening Twitter Users for Depression and PTSD with Lexical Decision Lists , 2015, CLPsych@HLT-NAACL.

[39]  Mike Conway Ethical Issues in Using Twitter for Public Health Surveillance and Research: Developing a Taxonomy of Ethical Concepts From the Research Literature , 2014, Journal of medical Internet research.

[40]  Mark Dredze,et al.  Shared Task : Depression and PTSD on Twitter , 2015 .

[41]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[42]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[43]  Maarten Sap,et al.  Towards Assessing Changes in Degree of Depression through Facebook , 2014, CLPsych@ACL.

[44]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.