Multi-label classification for intelligent health risk prediction

A Multi-Label Problem Transformation Joint Classification (MLPTJC) method is developed to solve the multi-label classification problem for the health and disease risk prediction based on physical examination records. We adopt a multi-class classification problem transformation method to transform the multi-label classification problem to a multi-class classification problem. Then We propose a Joint Decomposition Subset Classifier method to reduce the infrequent label sets to deal with the imbalance learning problem. Based on MLPTJC, existing cost-sensitive multi-class classification algorithms can be used to train the prediction models. We conduct some experiments to evaluate the performance of the MLPTJC method. The Support Vector Machine (SVM) and Random Forest (RF) algorithms are used for multi-class classification learning. We use the 10-fold cross-validation and metrics such as Average Accuracy, Precision, Recall and F-measure to evaluate the performance. The real physical examination records were employed, which include 62 examination items and 110, 300 anonymous patients. 8 types of diseases were predicted. The experimental results show that the MLPTJC method has better performance in terms of accuracy.

[1]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[3]  Ching-Hsue Cheng,et al.  A predictive model for cerebrovascular disease using data mining , 2011, Expert Syst. Appl..

[4]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[7]  T. Pramananda Perumal,et al.  A Predictive Approach for Diabetes Mellitus Disease through Data Mining Technologies , 2014, 2014 World Congress on Computing and Communication Technologies.

[8]  Fei Wang,et al.  A Multi-task Learning Framework for Joint Disease Risk Prediction and Comorbidity Discovery , 2014, 2014 22nd International Conference on Pattern Recognition.

[9]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[10]  Hisashi Kashima,et al.  Simultaneous Modeling of Multiple Diseases for Mortality Prediction in Acute Hospital Care , 2015, KDD.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Grigorios Tsoumakas,et al.  An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[13]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[14]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[15]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[16]  Dr. B. L. Shivakumar,et al.  A Survey on Data-Mining Technologies for Prediction and Diagnosis of Diabetes , 2014, 2014 International Conference on Intelligent Computing Applications.

[17]  Geoff Holmes,et al.  MEKA: A Multi-label/Multi-target Extension to WEKA , 2016, J. Mach. Learn. Res..

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Shahram Ebadollahi,et al.  Toward personalized care management of patients at risk: the diabetes case study , 2011, KDD.

[20]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[21]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .