Hierarchical Clustering Support Vector Machines for Classifying Type-2 Diabetes Patients

Using a large national health database, we propose an enhanced SVM-based model called Hierarchical Clustering Support Vector Machine (HCSVM)that utilizes multiple levels of clusters to classify patients diagnosed with type-2diabetes. Multiple HCSVMs are trained for clusters at different levels of the hierarchy.Some clusters at certain levels of the hierarchy capture more separablesample spaces than the others. As a result, HCSVMs at different levels may developdifferent classification capabilities. Since the locations of the superiorSVMs are data dependent, the HCSVM model in this study takes advantage ofan adaptive strategy to select the most suitable HCSVM for classifying the testingsamples. This model solves the large data set problem inherent with the traditionalsingle SVM model because the entire data set is partitioned into smallerand more homogenous clusters. Other approaches also use clustering and multipleSVM to solve the problem of large datasets. These approaches typical employedonly one level of clusters. However, a single level of clusters may notprovide an optimal partition of the sample space for SVM trainings. On the contrary,HCSVMs utilize multiple partitions available in a multilevel tree to capturea more separable sample space for SVM trainings. Compared with the traditionalsingle SVM model and one-level multiple SVMs model, the HCSVM Modelmarkedly improves the accuracy for classifying testing samples.

[1]  S. Vavasis Nonlinear optimization: complexity issues , 1991 .

[2]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  José L. Balcázar,et al.  Provably Fast Training Algorithms for Support Vector Machines , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Chih-Jen Lin,et al.  Training v-Support Vector Classifiers: Theory and Algorithms , 2001, Neural Computation.

[10]  Deepak K. Agarwal,et al.  Shrinkage estimator generalizations of Proximal Support Vector Machines , 2002, KDD.

[11]  Joseph L. Breault,et al.  Data mining a diabetic data warehouse , 2002, Artif. Intell. Medicine.

[12]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[13]  Giorgio Valentini,et al.  Low Bias Bagged Support Vector Machines , 2003, ICML.

[14]  Daniel Boley,et al.  Training Support Vector Machines Using Adaptive Clustering , 2004, SDM.

[15]  D. Roth,et al.  Economic and clinical disparities in hospitalized patients with type 2 diabetes. , 2004, Journal of nursing scholarship : an official publication of Sigma Theta Tau International Honor Society of Nursing.

[16]  Latifur Khan,et al.  An effective support vector machines (SVMs) performance using hierarchical clustering , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[17]  V. Pande,et al.  How does averaging affect protein structure comparison on the ensemble level? , 2004, Biophysical journal.

[18]  Yiyu Yao,et al.  Perspectives of granular computing , 2005, 2005 IEEE International Conference on Granular Computing.

[19]  Yi Pan,et al.  Clustering support vector machines for protein local structure prediction , 2007, Expert Syst. Appl..