Using Surface and Semantic Features for Detecting Early Signs of Self-Harm in Social Media Postings

This paper describes the University of Hildesheim submission to the CLEF eRisk 2020 shared task on detecting early signs of self-harm in social media posts. We introduce four systems that apply different methods trying to address this task and a fifth ensemble system that combines the four other systems. The first four systems make use of features of different types, such as time intervals between posts, the sentiment and semantics of the writings by using bag-of-words vectors and contextualized word embeddings in a neural network approach. The results show that while all our systems achieve a high recall, the focus of future work should be further improvement of the precision. All systems and the ensemble model achieve a comparable performance of Flatency values in the range of 0.367 to 0.424.

[1]  Fabio Crestani,et al.  A Test Collection for Research on Depression and Language Use , 2016, CLEF.

[2]  Tingshao Zhu,et al.  Predicting Big Five Personality Traits of Microblog Users , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[3]  A. Beck,et al.  An inventory for measuring depression. , 1961, Archives of general psychiatry.

[4]  Fabio Crestani,et al.  Overview of eRisk 2020: Early Risk Prediction on the Internet , 2020, CLEF.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Steven Bethard,et al.  Measuring the Latency of Depression Detection in Social Media , 2018, WSDM.

[7]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[8]  Ming-Wei Chang,et al.  Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Danah Boyd,et al.  Social Network Sites: Definition, History, and Scholarship , 2007, J. Comput. Mediat. Commun..

[11]  Ben Burtenshaw,et al.  Offence in Dialogues: A Corpus-Based Study , 2019, RANLP.

[12]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[13]  Fabio Crestani,et al.  Overview of eRisk at CLEF 2019: Early Risk Prediction on the Internet (extended overview) , 2019, CLEF.

[14]  Eric Horvitz,et al.  Characterizing and predicting postpartum depression from shared facebook data , 2014, CSCW.

[15]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[16]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[17]  D. Asch,et al.  Facebook language predicts depression in medical records , 2018, Proceedings of the National Academy of Sciences.

[18]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[19]  Fabio Crestani,et al.  eRisk 2020: Self-harm and Depression Challenges , 2020, ECIR.