“Because... I was told... so much”: Linguistic Indicators of Mental Health Status on Twitter

Abstract Recent studies have shown that machine learning can identify individuals with mental illnesses by analyzing their social media posts. Topics and words related to mental health are some of the top predictors. These findings have implications for early detection of mental illnesses. However, they also raise numerous privacy concerns. To fully evaluate the implications for privacy, we analyze the performance of different machine learning models in the absence of tweets that talk about mental illnesses. Our results show that machine learning can be used to make predictions even if the users do not actively talk about their mental illness. To fully understand the implications of these findings, we analyze the features that make these predictions possible. We analyze bag-of-words, word clusters, part of speech n-gram features, and topic models to understand the machine learning model and to discover language patterns that differentiate individuals with mental illnesses from a control group. This analysis confirmed some of the known language patterns and uncovered several new patterns. We then discuss the possible applications of machine learning to identify mental illnesses, the feasibility of such applications, associated privacy implications, and analyze the feasibility of potential mitigations.

[1]  Yang Wang,et al.  "I regretted the minute I pressed share": a qualitative study of regrets on Facebook , 2011, SOUPS.

[2]  Fabrício Benevenuto,et al.  Measuring the Facebook Advertising Ecosystem , 2019, NDSS.

[3]  Jeffrey D. Kromrey,et al.  Robust Confidence Intervals for Effect Sizes: A Comparative Study of Cohen's d and Cliff's Delta Under Non-normality and Heterogeneous Variances , 2004 .

[4]  Johannes Zimmermann,et al.  First-person Pronoun Use in Spoken Language as a Predictor of Future Depressive Symptoms: Preliminary Evidence from a Clinical Sample of Depressed Patients. , 2016, Clinical Psychology and Psychotherapy.

[5]  Rebekah Overdorf,et al.  POTs: The revolution will not be optimized? , 2018, ArXiv.

[6]  Sharath Chandra Guntuku,et al.  Detecting depression and mental illness on social media: an integrative review , 2017, Current Opinion in Behavioral Sciences.

[7]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[8]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[9]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[10]  Mark Dredze,et al.  Measuring Post Traumatic Stress Disorder in Twitter , 2014, ICWSM.

[11]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[12]  Leonardo Max Batista Claudino,et al.  Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter , 2015, CLPsych@HLT-NAACL.

[13]  Xin Shuai,et al.  Loose tweets: an analysis of privacy leaks on twitter , 2011, WPES.

[14]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[15]  Nazanin Andalibi,et al.  Sensitive Self-disclosures, Responses, and Social Support on Instagram: The Case of #Depression , 2017, CSCW.

[16]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[17]  Blase Ur,et al.  "i read my Twitter the next morning and was astonished": a conversational perspective on Twitter regrets , 2013, CHI.

[18]  J. Pennebaker,et al.  Linguistic styles: language use as an individual difference. , 1999, Journal of personality and social psychology.

[19]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[20]  Mike Conway,et al.  Ethical issues in using Twitter for population-level depression monitoring: a qualitative study , 2016, BMC Medical Ethics.

[21]  T. Johnstone,et al.  In an Absolute State: Elevated Use of Absolutist Words Is a Marker Specific to Anxiety, Depression, and Suicidal Ideation , 2018, Clinical psychological science : a journal of the Association for Psychological Science.

[22]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[23]  P. Cumming,et al.  Language Patterns Discriminate Mild Depression From Normal Sadness and Euthymic State , 2018, Front. Psychiatry.

[24]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[25]  Mark Dredze,et al.  From ADHD to SAD: Analyzing the Language of Mental Health on Twitter through Self-Reported Diagnoses , 2015, CLPsych@HLT-NAACL.

[26]  Stefan Axelsson,et al.  The base-rate fallacy and its implications for the difficulty of intrusion detection , 1999, CCS '99.

[27]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[28]  Andrew Booth,et al.  Attitudes Toward the Ethics of Research Using Social Media: A Systematic Review , 2017, Journal of medical Internet research.

[29]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[30]  Thang Nguyen,et al.  The University of Maryland CLPsych 2015 Shared Task System , 2015, CLPsych@HLT-NAACL.

[31]  Pim Cuijpers,et al.  Web-based depression treatment: associations of clients' word use with adherence and outcome. , 2014, Journal of affective disorders.

[32]  Maarten Sap,et al.  The role of personality, age, and gender in tweeting about mental illness , 2015, CLPsych@HLT-NAACL.

[33]  Philip Resnik,et al.  Using Topic Modeling to Improve Prediction of Neuroticism and Depression in College Students , 2013, EMNLP.

[34]  Mark Dredze,et al.  Shared Task : Depression and PTSD on Twitter , 2015 .

[35]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[36]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[37]  Nilly Mor,et al.  Self-focused attention and negative affect: a meta-analysis. , 2002, Psychological bulletin.