Boosting ICD multi-label classification of health records with contextual embeddings and label-granularity

BACKGROUND AND OBJECTIVE This work deals with clinical text mining, a field of Natural Language Processing applied to biomedical informatics. The aim is to classify Electronic Health Records with respect to the International Classification of Diseases, which is the foundation for the identification of international health statistics, and the standard for reporting diseases and health conditions. Within the framework of data mining, the goal is the multi-label classification, as each health record has assigned multiple International Classification of Diseases codes. We investigate five Deep Learning architectures with a dataset obtained from the Basque Country Health System, and six different perspectives derived from shifts in the input and the output. METHODS We evaluate a Feed Forward Neural Network as the baseline and several Recurrent models based on the Bidirectional GRU architecture, putting our research focus on the text representation layer and testing three variants, from standard word embeddings to meta word embeddings techniques and contextual embeddings. RESULTS The results showed that the recurrent models overcome the non-recurrent model. The meta word embeddings techniques are capable of beating the standard word embeddings, but the contextual embeddings exhibit as the most robust for the downstream task overall. Additionally, the label-granularity alone has an impact on the classification performance. CONCLUSIONS The contributions of this work are a) a comparison among five classification approaches based on Deep Learning on a Spanish dataset to cope with the multi-label health text classification problem; b) the study of the impact of document length and label-set size and granularity in the multi-label context; and c) the study of measures to mitigate multi-label text classification problems related to label-set size and sparseness.

[1]  Guido Zuccon,et al.  Overview of the CLEF eHealth Evaluation Lab 2018 , 2018, CLEF.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Stefan Schulz,et al.  Automated coding of diagnoses-three methods compared , 2000, AMIA.

[4]  Koldo Gojenola,et al.  On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions , 2015, J. Biomed. Informatics.

[5]  Yiming Yang,et al.  Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[6]  Mario Almagro,et al.  Estudio preliminar de la anotación automática de códigos CIE-10 en informes de alta hospitalarios , 2018, Proces. del Leng. Natural.

[7]  Kevin Donnelly,et al.  SNOMED-CT: The advanced terminology and coding system for eHealth. , 2006, Studies in health technology and informatics.

[8]  Jeff G. Schneider,et al.  Multi-Label Output Codes using Canonical Correlation Analysis , 2011, AISTATS.

[9]  Rodrigo C. Barros,et al.  Hierarchical Multi-Label Classification Networks , 2018, ICML.

[10]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[11]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Eyke Hüllermeier,et al.  Extreme F-measure Maximization using Sparse Probability Estimates , 2016, ICML.

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[15]  Mario Almagro,et al.  MAMTRA-MED at CLEF eHealth 2018: A Combination of Information Retrieval Techniques and Neural Networks for ICD-10 Coding of Death Certificates , 2018, CLEF.

[16]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[19]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[20]  Wenpeng Yin,et al.  Learning Word Meta-Embeddings , 2016, ACL.

[21]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[22]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[23]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[24]  Xiaocheng Feng,et al.  Target-Dependent Sentiment Classification with Long Short Term Memory , 2015, ArXiv.

[25]  Arantza Casillas,et al.  Exploring Joint AB-LSTM With Embedded Lemmas for Adverse Drug Reaction Discovery , 2019, IEEE Journal of Biomedical and Health Informatics.

[26]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[27]  Danushka Bollegala,et al.  Frustratingly Easy Meta-Embedding - Computing Meta-Embeddings by Averaging Source Word Embeddings , 2018, NAACL-HLT.

[28]  Nerea Ezeiza,et al.  IxaMed at CLEF eHealth 2018 Task 1: ICD10 Coding with a Sequence-to-Sequence Approach , 2018, CLEF.

[29]  Montserrat Marimon,et al.  The IULA Treebank , 2012, LREC.

[30]  Koldo Gojenola,et al.  Cardiology record multi-label classification using latent Dirichlet allocation , 2018, Comput. Methods Programs Biomed..

[31]  Mark Lee,et al.  High Accuracy Rule-based Question Classification using Question Syntax and Semantics , 2016, COLING.

[32]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[33]  Yangqiu Song,et al.  NNEMBs at SemEval-2017 Task 4: Neural Twitter Sentiment Classification: a Simple Ensemble Method with Different Embeddings , 2017, *SEMEVAL.

[34]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[35]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[36]  Johannes Fürnkranz,et al.  Large-Scale Multi-label Text Classification - Revisiting Neural Networks , 2013, ECML/PKDD.

[37]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[38]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[39]  Xin Geng,et al.  Binary relevance for multi-label learning: an overview , 2018, Frontiers of Computer Science.

[40]  Julien Velcin,et al.  Supervised Topic Models for Diagnosis Code Assignment to Discharge Summaries , 2016, CICLing.

[41]  Juan Martínez-Romo,et al.  Extending a Deep Learning Approach for Negation Cues Detection in Spanish , 2019, IberLEF@SEPLN.

[42]  T. H. Kyaw,et al.  Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database* , 2011, Critical care medicine.

[43]  Luis Alfonso Ureña López,et al.  SFU ReviewSP-NEG: a Spanish corpus annotated with negation for sentiment analysis. A typology of negation patterns , 2018, Lang. Resour. Evaluation.

[44]  Frank D. Wood,et al.  Diagnosis code assignment: models and evaluation metrics , 2013, J. Am. Medical Informatics Assoc..

[45]  Juan Martínez-Romo,et al.  Co-occurrence graphs for word sense disambiguation in the biomedical domain , 2018, Artif. Intell. Medicine.

[46]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[47]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[48]  Y. T. Zhou,et al.  Computation of optical flow using a neural network , 1988, IEEE 1988 International Conference on Neural Networks.

[49]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[50]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[51]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[52]  Murhaf Fares,et al.  Word vectors, reuse, and replicability: Towards a community repository of large-text resources , 2017, NODALIDA.

[53]  Luke S. Zettlemoyer,et al.  Higher-Order Coreference Resolution with Coarse-to-Fine Inference , 2018, NAACL.

[54]  Peng Zhou,et al.  Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling , 2016, COLING.