An Ensemble Multilabel Classification for Disease Risk Prediction

It is important to identify and prevent disease risk as early as possible through regular physical examinations. We formulate the disease risk prediction into a multilabel classification problem. A novel Ensemble Label Power-set Pruned datasets Joint Decomposition (ELPPJD) method is proposed in this work. First, we transform the multilabel classification into a multiclass classification. Then, we propose the pruned datasets and joint decomposition methods to deal with the imbalance learning problem. Two strategies size balanced (SB) and label similarity (LS) are designed to decompose the training dataset. In the experiments, the dataset is from the real physical examination records. We contrast the performance of the ELPPJD method with two different decomposition strategies. Moreover, the comparison between ELPPJD and the classic multilabel classification methods RAkEL and HOMER is carried out. The experimental results show that the ELPPJD method with label similarity strategy has outstanding performance.

[1]  Chaoyang Zhang,et al.  Multi-label classification for intelligent health risk prediction , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Grigorios Tsoumakas,et al.  An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[3]  Dr. B. L. Shivakumar,et al.  A Survey on Data-Mining Technologies for Prediction and Diagnosis of Diabetes , 2014, 2014 International Conference on Intelligent Computing Applications.

[4]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[5]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[6]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[7]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[8]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[9]  Shahram Ebadollahi,et al.  Toward personalized care management of patients at risk: the diabetes case study , 2011, KDD.

[10]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[11]  Ching-Hsue Cheng,et al.  A predictive model for cerebrovascular disease using data mining , 2011, Expert Syst. Appl..

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[14]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[15]  T. Pramananda Perumal,et al.  A Predictive Approach for Diabetes Mellitus Disease through Data Mining Technologies , 2014, 2014 World Congress on Computing and Communication Technologies.

[16]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[17]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.