OBJECTIVE
Accurate identification of self-harm presentations to Emergency Departments (ED) can lead to more timely mental health support, aid in understanding the burden of suicidal intent in a population, and support impact evaluation of public health initiatives related to suicide prevention. Given lack of manual self-harm reporting in ED, we aim to develop an automated system for the detection of self-harm presentations directly from ED triage notes.
MATERIALS AND METHODS
We frame this as supervised classification using natural language processing (NLP), utilizing a large data set of 477 627 free-text triage notes from ED presentations in 2012-2018 to The Royal Melbourne Hospital, Australia. The data were highly imbalanced, with only 1.4% of triage notes relating to self-harm. We explored various preprocessing techniques, including spelling correction, negation detection, bigram replacement, and clinical concept recognition, and several machine learning methods.
RESULTS
Our results show that machine learning methods dramatically outperform keyword-based methods. We achieved the best results with a calibrated Gradient Boosting model, showing 90% Precision and 90% Recall (PR-AUC 0.87) on blind test data. Prospective validation of the model achieves similar results (88% Precision; 89% Recall).
DISCUSSION
ED notes are noisy texts, and simple token-based models work best. Negation detection and concept recognition did not change the results while bigram replacement significantly impaired model performance.
CONCLUSION
This first NLP-based classifier for self-harm in ED notes has practical value for identifying patients who would benefit from mental health follow-up in ED, and for supporting surveillance of self-harm and suicide prevention efforts in the population.