Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes

BackgroundWe present a potentially useful alternative approach based on support vector machine (SVM) techniques to classify persons with and without common diseases. We illustrate the method to detect persons with diabetes and pre-diabetes in a cross-sectional representative sample of the U.S. population.MethodsWe used data from the 1999-2004 National Health and Nutrition Examination Survey (NHANES) to develop and validate SVM models for two classification schemes: Classification Scheme I (diagnosed or undiagnosed diabetes vs. pre-diabetes or no diabetes) and Classification Scheme II (undiagnosed diabetes or pre-diabetes vs. no diabetes). The SVM models were used to select sets of variables that would yield the best classification of individuals into these diabetes categories.ResultsFor Classification Scheme I, the set of diabetes-related variables with the best classification performance included family history, age, race and ethnicity, weight, height, waist circumference, body mass index (BMI), and hypertension. For Classification Scheme II, two additional variables--sex and physical activity--were included. The discriminative abilities of the SVM models for Classification Schemes I and II, according to the area under the receiver operating characteristic (ROC) curve, were 83.5% and 73.2%, respectively. The web-based tool-Diabetes Classifier was developed to demonstrate a user-friendly application that allows for individual or group assessment with a configurable, user-defined threshold.ConclusionsSupport vector machine modeling is a promising classification approach for detecting persons with common diseases such as diabetes and pre-diabetes in the population. This approach should be further explored in other complex diseases using common variables.

[1]  J. Lindström,et al.  Tools for Predicting the Risk of Type 2 Diabetes in Daily Practice , 2008, Hormone and metabolic research = Hormon- und Stoffwechselforschung = Hormones et metabolisme.

[2]  Gianni Tognoni,et al.  Use of the diabetes risk score for opportunistic screening of undiagnosed diabetes and impaired glucose tolerance: the IGLOO (Impaired Glucose Tolerance and Long-Term Outcomes Observational) study. , 2005, Diabetes care.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[5]  Euripidis Loukis,et al.  Support Vectors Machine-based identification of heart valve diseases using heart sounds , 2009, Comput. Methods Programs Biomed..

[6]  S. Fowler,et al.  Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. , 2002 .

[7]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[8]  F. Pi‐Sunyer How effective are lifestyle changes in the prevention of type 2 diabetes mellitus? , 2007, Nutrition reviews.

[9]  W. Alexander,et al.  American diabetes association. , 2010, P & T : a peer-reviewed journal for formulary management.

[10]  K. Matthews,et al.  Improving the performance of physiologic hot flash measures with support vector machines. , 2009, Psychophysiology.

[11]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[12]  E. Hyppönen,et al.  Type 2 diabetes mellitus in midlife estimated from the Cambridge Risk Score and body mass index. , 2006, Archives of internal medicine.

[13]  D. Eddy,et al.  A simple tool for detecting undiagnosed diabetes and pre-diabetes , 2008 .

[14]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[15]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[16]  J. Dora,et al.  Standards of Medical Care in Diabetes—2008 , 2008, Diabetes Care.

[17]  Fodor Jg,et al.  Prevention of type 2 diabetes mellitus by changes in lifestyle. , 2001 .

[18]  R. Richards-Kortum,et al.  A comparison of C/B ratios from studies using receiver operating characteristic curve analysis. , 1999, Journal of clinical epidemiology.

[19]  Idf Clinical Guidelines Task Force Global Guideline for Type 2 Diabetes: recommendations for standard, comprehensive, and minimal care , 2006, Diabetic medicine : a journal of the British Diabetic Association.

[20]  S. Friend,et al.  A network view of disease and compound screening , 2009, Nature Reviews Drug Discovery.

[21]  Silvio E. Inzucchi,et al.  Standards of Medical Care in Diabetes—2008 , 2008, Diabetes Care.

[22]  Y. Jang,et al.  Standards of Medical Care in Diabetes-2010 by the American Diabetes Association: Prevention and Management of Cardiovascular Disease , 2010 .

[23]  Goran Nenadic,et al.  Mining protein function from text using term-based support vector machines , 2005, BMC Bioinformatics.