Applicability of Machine Learning Methods to Multi-label Medical Text Classification

Structuring medical text using international standards allows to improve interoperability and quality of predictive modelling. Medical text classification task facilitates information extraction. In this work we investigate the applicability of several machine learning models and classifier chains (CC) to medical unstructured text classification. The experimental study was performed on a corpus of 11671 manually labeled Russian medical notes. The results showed that using CC strategy allows to improve classification performance. Ensemble of classifier chains based on linear SVC showed the best result: 0.924 micro F-measure, 0.872 micro precision and 0.927 micro recall.

[1]  Pierre Zweigenbaum,et al.  Automatic classification of free-text medical causes from death certificates for reactive mortality surveillance in France , 2019, Int. J. Medical Informatics.

[2]  Abbas Raza Ali,et al.  Urdu text classification , 2009, FIT.

[3]  Josef Kittler,et al.  Multilabel classification using heterogeneous ensemble of multi-label classifiers , 2012, Pattern Recognit. Lett..

[4]  Hannes Ulrich,et al.  Metadata Repository for Improved Data Sharing and Reuse Based on HL7 FHIR , 2016, MIE.

[5]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[6]  Jinbo Bi,et al.  Large Scale Diagnostic Code Classification for Medical Patient Records , 2008, IJCNLP.

[7]  Eyke Hüllermeier,et al.  Combining instance-based learning and logistic regression for multilabel classification , 2009, Machine Learning.

[8]  Antoine Widlöcher,et al.  Automatic Symptom Extraction from Texts to Enhance Knowledge Discovery on Rare Diseases , 2015, AIME.

[9]  Hongfang Liu,et al.  Standardizing Heterogeneous Annotation Corpora Using HL7 FHIR for Facilitating their Reuse and Integration in Clinical NLP , 2018, AMIA.

[10]  Hongfang Liu,et al.  Journal of Biomedical Informatics , 2022 .

[11]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[12]  Sarah Ballout,et al.  Implementing LOINC: Current Status and Ongoing Work at the Hannover Medical School , 2019, EFMI-STC.

[13]  Julia Xu,et al.  Using SNOMED CT-encoded problems to improve ICD-10-CM coding - A randomized controlled experiment , 2019, Int. J. Medical Informatics.

[14]  S. Spata,et al.  Multi-label Classification of Clinical Text Documents considering the Impact of Text Pre-processing and Training size , 2011 .

[15]  Guo-Zheng Li,et al.  Clinical multi-label free text classification by exploiting disease label relation , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[16]  Jyoti Mandowara,et al.  Text Classification by Combining Text Classifiers to Improve the Efficiency of Classification , 2016 .

[17]  HüllermeierEyke,et al.  Combining instance-based learning and logistic regression for multilabel classification , 2009 .

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  Gianluigi Zanetti,et al.  OpenEHR modeling for genomics in clinical practice , 2018, Int. J. Medical Informatics.

[20]  Eyke Hüllermeier,et al.  Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization , 2013, ICML.

[21]  Dipak Kalra,et al.  Building a Logical EHR architecture based on ISO 13606 standard and Semantic Web Technologies , 2010, MedInfo.

[22]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[23]  Aytug Onan,et al.  Ensemble of keyword extraction methods and classifiers in text classification , 2016, Expert Syst. Appl..

[24]  David L. Reich,et al.  Extraction and Mapping of Drug Names from Free Text to a Standardized Nomenclature , 2007, AMIA.

[25]  Jeewani Anupama Ginige,et al.  Analysing Effectiveness of Multi-Label Classification in Clinical Coding , 2019, ACSW.

[26]  Kavishwar B. Wagholikar,et al.  Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach , 2017, BMC Medical Informatics and Decision Making.

[27]  Georgy Kopanitsa,et al.  Investigation of Content Overlap in Proprietary Medical Mappings , 2019, EFMI-STC.

[28]  Noémie Elhadad,et al.  Multi-Label Classification of Patient Notes: Case Study on ICD Code Assignment , 2018, AAAI Workshops.

[29]  Hua Xu,et al.  Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin , 2011, J. Am. Medical Informatics Assoc..

[30]  Olga Lyashevskaya,et al.  Evaluation for morphologically rich language: Russian NLP , 2015 .

[31]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[32]  Anette Hulth,et al.  General-Purpose Text Categorization Applied to the Medical Domain. , 2007 .