Can We Assess Mental Health through Social Media and Smart Devices? Addressing Bias in Methodology and Evaluation

Predicting mental health from smartphone and social media data on a longitudinal basis has recently attracted great interest, with very promising results being reported across many studies [3, 9, 13, 26]. Such approaches have the potential to revolutionise mental health assessment, if their development and evaluation follows a real world deployment setting. In this work we take a closer look at state-of-the-art approaches, using different mental health datasets and indicators, different feature sources and multiple simulations, in order to assess their ability to generalise. We demonstrate that under a pragmatic evaluation framework, none of the approaches deliver or even approach the reported performances. In fact, we show that current state-of-the-art approaches can barely outperform the most naive baselines in the real-world setting, posing serious questions not only about their deployment ability, but also about the contribution of the derived features for the mental health assessment task and how to make better use of such data in the future.

[1]  Alexander Russell,et al.  Behavior vs. introspection: refining prediction of clinical depression via smartphone sensing data , 2016, 2016 IEEE Wireless Health (WH).

[2]  Saif Mohammad,et al.  NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets , 2014, SemEval@COLING.

[3]  Robert Li Kam Wa MoodScope: Building a Mood Sensor from Smartphone Usage Patterns , 2012 .

[4]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[7]  Saif Mohammad,et al.  Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus , 2009, EMNLP.

[8]  Cecilia Mascolo,et al.  Mobile Sensing at the Service of Mental Well-being: a Large-scale Longitudinal Study , 2017, WWW.

[9]  D. Watson,et al.  Development and validation of brief measures of positive and negative affect: the PANAS scales. , 1988, Journal of personality and social psychology.

[10]  L. Hiller,et al.  The Warwick-Edinburgh Mental Well-being Scale (WEMWBS): development and UK validation , 2007, Health and quality of life outcomes.

[11]  Konrad P. Körding,et al.  Meaningless comparisons lead to false optimism in medical machine learning , 2017, PloS one.

[12]  Alex Pentland,et al.  Pervasive stress recognition for sustainable living , 2014, 2014 IEEE International Conference on Pervasive Computing and Communication Workshops (PERCOM WORKSHOPS).

[13]  Fanglin Chen,et al.  StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones , 2014, UbiComp.

[14]  Yoram Bachrach,et al.  Studying User Income through Language, Behaviour and Affect in Social Media , 2015, PloS one.

[15]  M. Goldfarb That's life. , 1988, Journal of the Tennessee Medical Association.

[16]  Akane Sano,et al.  Predicting students' happiness from physiology, phone, mobility, and behavioral data , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[17]  Fabio Pianesi,et al.  Happiness Recognition from Mobile Phone Data , 2013, 2013 International Conference on Social Computing.

[18]  T. Strine,et al.  The PHQ-8 as a measure of current depression in the general population. , 2009, Journal of affective disorders.

[19]  Akane Sano,et al.  Multi-task , Multi-Kernel Learning for Estimating Individual Wellbeing , 2015 .

[20]  Maria Liakata,et al.  Combining Heterogeneous User Generated Data to Sense Well-being , 2016, COLING.

[21]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[22]  Shekhar Saxena,et al.  Promoting mental health: concepts, emerging evidence, practice: a report of the World Health Organization, Department of Mental Health and Substance Abuse in collaboration with the Victorian Health Promotion Foundation and the University of Melbourne. , 2005 .

[23]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[24]  Yoshihiko Suhara,et al.  DeepMood: Forecasting Depressed Mood Based on Self-Reported Histories via Recurrent Neural Networks , 2017, WWW.

[25]  J. Olesen,et al.  The economic cost of brain disorders in Europe , 2012, European journal of neurology.

[26]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[27]  Guodong Sun,et al.  Daily Mood Assessment Based on Mobile Phone Sensing , 2012, 2012 Ninth International Conference on Wearable and Implantable Body Sensor Networks.

[28]  Mirco Musolesi,et al.  Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis , 2015, UbiComp.

[29]  T. Melchert,et al.  Measuring Well-Being , 2016 .

[30]  Rui Wang,et al.  CrossCheck: toward passive sensing and detection of mental health changes in people with schizophrenia , 2016, UbiComp.

[31]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.