Underlying cause of death identification from death certificates using reverse coding to text and a NLP based deep learning approach

Abstract The identification of the underlying cause of death is a matter of primary importance and one of the most challenging issues in the setting of healthcare policy making. The World Health Organisation provides guidelines for death certificates coding using the ICD10 classification. Guidelines can be manually applied, but there exist some coding support systems that implement them to simplify the coding work. Nevertheless, there is disparity among countries with respect to the level and the quality of death certificates registration. In this work we propose an effective supervised model based on Natural Language Processing algorithms to the aim of correctly classify the underlying cause of death from death certificates. In our study we compared tabular representations of the death certificate, including the hierarchical path of each condition in the classification, with a novel representation consisting in translating back to their standard title the conditions expressed as ICD-10 codes. Our experimental evaluation, after training on 10.5 millions certificates, reached a 99.03% accuracy, which currently outperforms state-of-the-art systems. For its practical applicability, we studied performance by classification chapter and found that accuracy is low only for chapters including very rare death causes. Finally, to show the robustness of our model, we leverage the model confidence to help identifying death certificates for which a manual coding is needed.