Towards Preemptive Detection of Depression and Anxiety in Twitter

Depression and anxiety are psychiatric disorders that are observed in many areas of everyday life. For example, these disorders manifest themselves somewhat frequently in texts written by nondiagnosed users in social media. However, detecting users with these conditions is not a straightforward task as they may not explicitly talk about their mental state, and if they do, contextual cues such as immediacy must be taken into account. When available, linguistic flags pointing to probable anxiety or depression could be used by medical experts to write better guidelines and treatments. In this paper, we develop a dataset designed to foster research in depression and anxiety detection in Twitter, framing the detection task as a binary tweet classification problem. We then apply state-of-the-art classification models to this dataset, providing a competitive set of baselines alongside qualitative error analysis. Our results show that language models perform reasonably well, and better than more traditional baselines. Nonetheless, there is clear room for improvement, particularly with unbalanced training sets and in cases where seemingly obvious linguistic cues (keywords) are used counter-intuitively.

[1]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[2]  Derek Bolton,et al.  What Is Mental Disorder? an Essay in Philosophy, Science, and Values , 2008 .

[3]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[4]  Mustafa Suleyman,et al.  Key challenges for delivering clinical impact with artificial intelligence , 2019, BMC Medicine.

[5]  Johan Bollen,et al.  Depressed individuals express more distorted thinking on social media , 2020, ArXiv.

[6]  Christian Haring,et al.  Suicide prevention for youth - a mental health awareness program: lessons learned from the Saving and Empowering Young Lives in Europe (SEYLE) intervention study , 2012, BMC Public Health.

[7]  Víctor M. Prieto,et al.  Twitter: A Good Place to Detect Health Conditions , 2014, PloS one.

[8]  Organización Mundial de la Salud World health statistics 2017: monitoring health for the SDGs, Sustainable Development Goals , 2018 .

[9]  Chien-Ching Chiu,et al.  Application of Support Vector Machine (SVM) in the Sentiment Analysis of Twitter DataSet , 2020, Applied Sciences.

[10]  Stefano Mizzaro,et al.  Twitter goes to the Doctor: Detecting Medical Tweets using Machine Learning and BERT , 2020, SIIRH@ECIR.

[11]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[12]  Bechara Choucair,et al.  Health Department Use of Social Media to Identify Foodborne Illness — Chicago, Illinois, 2013–2014 , 2014, MMWR. Morbidity and mortality weekly report.

[13]  Kenneth S. Kendler,et al.  What Is Mental Disorder? An Essay in Philosophy, Science, and Values , 2008 .

[14]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[15]  Sharath Chandra Guntuku,et al.  Detecting depression and mental illness on social media: an integrative review , 2017, Current Opinion in Behavioral Sciences.

[16]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[17]  A. Karch,et al.  Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? , 2016, BMC Medical Research Methodology.

[18]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[19]  Nazli Goharian,et al.  Depression and Self-Harm Risk Assessment in Online Forums , 2017, EMNLP.

[20]  Horacio Saggion,et al.  SemEval 2018 Task 2: Multilingual Emoji Prediction , 2018, *SEMEVAL.

[21]  J. Schonfeld,et al.  Challenges and opportunities for public health made possible by advances in natural language processing. , 2020, Canada communicable disease report = Releve des maladies transmissibles au Canada.

[22]  Barry M. G. Cheetham,et al.  REFORMULATION AND GENERALISATION OF THE COHEN AND FLEISS KAPPAS , 2017 .

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[26]  N. Jansen,et al.  Prevention of long-term sickness absence and major depression in high-risk employees: a randomised controlled trial , 2010, Occupational and Environmental Medicine.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.