Latent Class Multi-Label Classification to Identify Subclasses of Disease for Improved Prediction

Disease subtyping can assist the development of precision medicine but remains a challenge in data analysis by reason of the many different methods to group individuals depending on their data. However, identification of subclasses of disease will help to produce better models which are more specific to patients and will improve prediction and interpretation of underlying characteristics of disease. This paper presents a novel algorithm that integrates latent class models with supervised learning. The new algorithm uses latent class models to cluster patients within groups that results in improved classification as well as aiding the understanding of the dissimilarities of the discovered groups. The methods are tested on data from patients with Systemic Sclerosis (SSc), a rare potentially fatal condition. Results show that the "Latent Class Multi-Label Classification Model" improves accuracy when compared with competitive similar methods.

[1]  M. Shouman,et al.  Using data mining techniques in heart disease diagnosis and treatment , 2012, 2012 Japan-Egypt Conference on Electronics, Communications and Computers.

[2]  P. Malani Harrison’s Principles of Internal Medicine , 2012 .

[3]  Denis Deratani Mauá,et al.  An Ensemble of Bayesian Networks for Multilabel Classification , 2013, IJCAI.

[4]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[5]  Zeynep Ceylan,et al.  Comparison of Multi-Label Classification Methods for Prediagnosis of Cervical Cancer , 2017 .

[6]  Faisal Kabir,et al.  Enhanced Classification Accuracy on Naive Bayes Data Mining Models , 2011 .

[7]  Amit Thakkar,et al.  A Survey and Current Research Challenges in Multi-Label Classification Methods , 2012 .

[8]  Yu-Chiang Frank Wang,et al.  Learning Deep Latent Spaces for Multi-Label Classification , 2017, ArXiv.

[9]  Oliver Distler,et al.  2013 classification criteria for systemic sclerosis: an American College of Rheumatology/European League against Rheumatism collaborative initiative. , 2013, Arthritis and rheumatism.

[10]  Dominique Haughton,et al.  Identifying Groups: A Comparison of Methodologies , 2011, Journal of Data Science.

[11]  Ankita Dey,et al.  Application of latent class analysis to estimate susceptibility to adverse health outcomes based on several risk factors , 2016 .

[12]  D. Rindskopf,et al.  The value of latent class analysis in medical diagnosis. , 1986, Statistics in medicine.

[13]  Peter A. Flach,et al.  Multi-label Classification: A Comparative Study on Threshold Selection Methods , 2014 .