On the State of Social Media Data for Mental Health Research

Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade. However, progress in the domain, in terms of both medical understanding and system performance, remains bounded by the availability of adequate data. Prior systematic reviews have not necessarily made it possible to measure the degree to which data-related challenges have affected research progress. In this paper, we offer an analysis specifically on the state of social media data that exists for conducting mental health research. We do so by introducing an open-source directory of mental health datasets, annotated using a standardized schema to facilitate meta-analysis.

[1]  Rada Mihalcea,et al.  Text-Based Detection and Understanding of Changes in Mental Health , 2018, SocInfo.

[2]  Mark Dredze,et al.  Detecting Changes in Suicide Content Manifested in Social Media Following Celebrity Suicides , 2015, HT.

[3]  Suchi Saria,et al.  Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist , 2020, Nature Medicine.

[4]  Marti A. Hearst,et al.  Towards augmenting crisis counselor training by improving message retrieval , 2019, Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology.

[5]  Bethany A. Teachman,et al.  Identification of Imminent Suicide Risk Among Young Adults using Text Messages , 2018, CHI.

[6]  Munmun De Choudhury,et al.  A Social Media Based Index of Mental Well-Being in College Campuses , 2017, CHI.

[7]  Munmun De Choudhury,et al.  Gender and Cross-Cultural Differences in Social Media Disclosures of Mental Illness , 2017, CSCW.

[8]  Gautam Srivastava,et al.  A Decentralized Privacy-Preserving Healthcare Blockchain for IoT , 2019, Sensors.

[9]  Munmun De Choudhury,et al.  Modeling Stress with Social Media Around Incidents of Gun Violence on College Campuses , 2017, Proc. ACM Hum. Comput. Interact..

[10]  S. Bucci,et al.  The digital revolution and its impact on mental health care. , 2019, Psychology and psychotherapy.

[11]  Lei Zhang,et al.  Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users , 2014, HCC.

[12]  Munmun De Choudhury,et al.  Anorexia on Tumblr: A Characterization Study , 2015, Digital Health.

[13]  Minsu Park,et al.  Depressive Moods of Users Portrayed in Twitter , 2012 .

[14]  Bart Desmet,et al.  SMHD: a Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions , 2018, COLING.

[15]  Tat-Seng Chua,et al.  What Does Social Media Say about Your Stress? , 2016, IJCAI.

[16]  Mark Dredze,et al.  Mental Health Surveillance over Social Media with Digital Cohorts , 2019, Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology.

[17]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[18]  Miguel A. Vadillo,et al.  Researching Mental Health Disorders in the Era of Social Media: Systematic Review , 2017, Journal of medical Internet research.

[19]  James Pustejovsky,et al.  Distinguishing Clinical Sentiment: The Importance of Domain Adaptation in Psychiatric Patient Health Records , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[20]  Galen Panger Reassessing the Facebook experiment: critical thinking about the validity of Big Data research , 2016 .

[21]  Fabio Crestani,et al.  eRISK 2017: CLEF Lab on Early Risk Prediction on the Internet: Experimental Foundations , 2017, CLEF.

[22]  W. Price,et al.  Privacy in the age of medical big data , 2019, Nature Medicine.

[23]  Sharath Chandra Guntuku,et al.  Detecting depression and mental illness on social media: an integrative review , 2017, Current Opinion in Behavioral Sciences.

[24]  Munmun De Choudhury,et al.  Recovery Amid Pro-Anorexia: Analysis of Recovery in Social Media , 2016, CHI.

[25]  Chirag Shah,et al.  What social media data should i use in my research?: A comparative analysis of twitter, youtube, reddit, and the new york times comments , 2016, ASIST.

[26]  D. Asch,et al.  Facebook language predicts depression in medical records , 2018, Proceedings of the National Academy of Sciences.

[27]  Philip Resnik,et al.  Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings , 2018, CLPsych@NAACL-HTL.

[28]  David DeVault,et al.  The Distress Analysis Interview Corpus of human and computer interviews , 2014, LREC.

[29]  Munmun De Choudhury,et al.  Measuring the Impact of Anxiety on Online Social Interactions , 2018, ICWSM.

[30]  Mark Dredze,et al.  Do Models of Mental Health Based on Social Media Data Generalize? , 2020, FINDINGS.

[31]  Munmun De Choudhury,et al.  The Language of Social Support in Social Media and Its Effect on Suicidal Ideation Risk , 2017, ICWSM.

[32]  Madhu C. Reddy,et al.  Sharing Patient-Generated Data in Clinical Practices: An Interview Study , 2016, AMIA.

[33]  Stefan Scherer,et al.  What type of happiness are you looking for? - A closer look at detecting mental health from language , 2018, CLPsych@NAACL-HTL.

[34]  Solon Barocas,et al.  Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[35]  Mark Dredze,et al.  Shared Task : Depression and PTSD on Twitter , 2015 .

[36]  Nazli Goharian,et al.  Depression and Self-Harm Risk Assessment in Online Forums , 2017, EMNLP.

[37]  Maarten Sap,et al.  Towards Assessing Changes in Degree of Depression through Facebook , 2014, CLPsych@ACL.

[38]  J. Ayers,et al.  Seasonality in seeking mental health information on Google. , 2013, American journal of preventive medicine.

[39]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[40]  Micah Iserman,et al.  Within and Between-Person Differences in Language Used Across Anxiety Support and Neutral Reddit Communities , 2018, CLPsych@NAACL-HTL.

[41]  Daniel Jurafsky,et al.  Automatic Detection of Incoherent Speech for Diagnosing Schizophrenia , 2018, CLPsych@NAACL-HTL.

[42]  Mark Dredze,et al.  Ethical Research Protocols for Social Media Health Research , 2017, EthNLP@EACL.

[43]  Munmun De Choudhury,et al.  Norms Matter: Contrasting Social Support Around Behavior Change in Online Weight Loss Communities , 2018, CHI.

[44]  Mike Conway,et al.  Towards Automatically Classifying Depressive Symptoms from Twitter Data for Population Health , 2016, PEOPLES@COLING.

[45]  Glen Coppersmith,et al.  Exploratory Analysis of Social Media Prior to a Suicide Attempt , 2016, CLPsych@HLT-NAACL.

[46]  David C. Atkins,et al.  Smartphone-Based Passive Assessment of Mobility in Depression: Challenges and Opportunities. , 2018, Mental health and physical activity.

[47]  C. Fuchs Culture and Economy in the Age of Social Media , 2015 .

[48]  Michael D. Barnes,et al.  Tracking suicide risk factors through Twitter in the US. , 2014, Crisis.

[49]  Mike Conway,et al.  Towards Developing an Annotation Scheme for Depressive Disorder Symptoms: A Preliminary Study using Twitter Data , 2015, CLPsych@HLT-NAACL.

[50]  Kathleen M. Carley,et al.  A Hierarchical Location Prediction Neural Network for Twitter User Geolocation , 2019, EMNLP.

[51]  Alexander Benlian,et al.  User Dynamics in Mental Health Forums - A Sentiment Analysis Perspective , 2019, Wirtschaftsinformatik.

[52]  Rafael A. Calvo,et al.  CLPsych 2016 Shared Task: Triaging content in online peer-support forums , 2016, CLPsych@HLT-NAACL.

[53]  Frank Rudzicz,et al.  Detecting Anxiety through Reddit , 2017, CLPsych@ACL.

[54]  C. Faravelli,et al.  Assessment of depression: a comparison of rating scales. , 1986, Journal of affective disorders.

[55]  P. Resnik,et al.  CLPsych 2019 Shared Task: Predicting the Degree of Suicide Risk in Reddit Posts , 2019, Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology.

[56]  Munmun De Choudhury,et al.  Methodological Gaps in Predicting Mental Health States from Social Media: Triangulating Diagnostic Signals , 2019, CHI.

[57]  Mark Dredze,et al.  From ADHD to SAD: Analyzing the Language of Mental Health on Twitter through Self-Reported Diagnoses , 2015, CLPsych@HLT-NAACL.

[58]  John Marcotte,et al.  ICPSR Virtual Data Enclave as a Collaboratory for Team Science , 2019 .

[59]  Tat-Seng Chua,et al.  Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution , 2017, IJCAI.

[60]  Ahmed M. Elmisery,et al.  Privacy Preserving Distributed Learning Clustering of HealthCare Data Using Cryptography Protocols , 2010, COMPSAC Workshops.

[61]  Jan Snajder,et al.  Not Just Depressed: Bipolar Disorder Prediction on Reddit , 2018, WASSA@EMNLP.

[62]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[63]  Glen Coppersmith,et al.  Cross-cultural differences in language markers of depression online , 2018, CLPsych@NAACL-HTL.

[64]  M. Millard,et al.  Detecting Linguistic Traces of Depression in Topic-Restricted Text: Attending to Self-Stigmatized Depression with NLP , 2018 .

[65]  Fabio Crestani,et al.  Overview of eRisk: Early Risk Prediction on the Internet (Extended Lab Overview) , 2018, CLEF.

[66]  Kathleen McKeown,et al.  Dreaddit: A Reddit Dataset for Stress Analysis in Social Media , 2019, EMNLP.

[67]  Sejin Park,et al.  The role of social media in local government crisis communications , 2015 .

[68]  M. Gorelick,et al.  Bias arising from missing data in predictive models. , 2006, Journal of clinical epidemiology.

[69]  Maria Liakata,et al.  The language of mental health problems in social media , 2016, CLPsych@HLT-NAACL.

[70]  Douglas M. Blough,et al.  Data obfuscation: anonymity and desensitization of usable data sets , 2004, IEEE Security & Privacy Magazine.

[71]  Çağrı Çöltekin,et al.  Identifying Depression on Reddit: The Effect of Training Data , 2018, EMNLP 2018.

[72]  Bart Desmet,et al.  RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses , 2018, CLPsych@NAACL-HTL.

[73]  G. Arbanas Diagnostic and Statistical Manual of Mental Disorders (DSM-5) , 2015 .

[74]  A. Anderson Social Media Use in 2018 , 2018 .

[75]  Mark Dredze,et al.  Measuring Post Traumatic Stress Disorder in Twitter , 2014, ICWSM.

[76]  Munmun De Choudhury,et al.  Quantifying and Predicting Mental Illness Severity in Online Pro-Eating Disorder Communities , 2016, CSCW.

[77]  Stevie Chancellor,et al.  Methods in predictive techniques for mental health status on social media: a critical review , 2020, npj Digital Medicine.

[78]  Mark Dredze,et al.  Using Noisy Self-Reports to Predict Twitter User Demographics , 2020, SOCIALNLP.

[79]  Mark Dredze,et al.  Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media , 2016, CHI.