An ensemble embedded feature selection method for multi-label clinical text classification

Clinical data records a patient's health status, where multi-label type of data exists. For example, a patient suffering from cough and fever should be associated with both two disease labels in the clinical records. Specifically, due to the redundant or irrelevant features in clinical data, the performance of multi-label classification will be limited, therefore selecting effective features from the feature space is necessary. However, few methods have been proposed to deal with multi-label feature selection problem in the past few years, which now only adopt a simple and direct strategy which transforms the multi-label feature selection problem into more single-label ones and ignore correlations among different labels. In this paper, a novel method named ensemble embedded feature selection (EEFS) is proposed to handle multi-label clinical data learning problem in a more effective and efficient way. EEFS does not explicitly find out the correlations among labels, but it can adequately utilize the label correlations by multi-label classifiers and evaluation measures. Furthermore, It can reduce the accumulated errors of data itself by employing ensemble method. Experimental results on clinical dataset show that our algorithm achieves significant superiority over other state-of-the-art algorithms.

[1]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[2]  Guo-Zheng Li,et al.  Multilabel Learning via Random Label Selection for Protein Subcellular Multilocations Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  David Page,et al.  Extracting BI-RADS features from Portuguese clinical texts , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[4]  Yiqin Wang,et al.  Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine , 2013, Science China Information Sciences.

[5]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[6]  Guo-Zheng Li,et al.  Clinical multi-label free text classification by exploiting disease label relation , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[7]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[8]  Yan Chen,et al.  Embedded Feature Selection for Multi-label Classification of Music Emotions , 2012, Int. J. Comput. Intell. Syst..

[9]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[10]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[11]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[12]  Zhi-Hua Zhou,et al.  Multilabel dimensionality reduction via dependence maximization , 2008, TKDD.