Using data mining techniques for multi-diseases prediction modeling of hypertension and hyperlipidemia by common risk factors

Research highlights? This paper proposes a two-phase analysis procedure to simultaneously predict hypertension and hyperlipidemia. ? Common risk factors of these diseases picked up by data mining and majority vote. ? This study uses common risk factors to build MARS predictive models for hypertension and hyperlipidemia. Many previous studies have employed predictive models for a specific disease, but fail to note that humans often suffer from not only one disease, but associated diseases as well. Because these associated multiple diseases might have reciprocal effects, and abnormalities in physiological indicators can indicate multiple associated diseases, common risk factors can be used to predict the multiple associated diseases. This approach provides a more effective and comprehensive forecasting mechanism for preventive medicine. This paper proposes a two-phase analysis procedure to simultaneously predict hypertension and hyperlipidemia. Firstly, we used six data mining approaches to select the individual risk factors of these two diseases, and then determined the common risk factors using the voting principle. Next, we used the Multivariate Adaptive Regression Splines (MARS) method to construct a multiple predictive model for hypertension and hyperlipidemia. This study uses data from a physical examination center database in Taiwan that includes 2048 subjects. The proposed analysis procedure shows that the common risk factors of hypertension and hyperlipidemia are Systolic Blood Pressure (SBP), Triglycerides, Uric Acid (UA), Glutamate Pyruvate Transaminase (GPT), and gender. The proposed multi-diseases predictor method has a classification accuracy rate of 93.07%. The results of this paper provide an effective and appropriate methodology for simultaneously predicting hypertension and hyperlipidemia.

[1]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[2]  C. Braas,et al.  Risk factors for hyperlipidemia in long-term pediatric renal transplant recipients , 2000, Pediatric Nephrology.

[3]  E. Plaza,et al.  Individual prognosis of diabetes long-term risks: a CBR approach. , 2001, Methods of information in medicine.

[4]  K. Bønaa,et al.  Association Between Blood Pressure and Serum Lipids in a Population: The Tromsø Study , 1991, Circulation.

[5]  Richard D. De Veaux,et al.  Modeling of topographic effects on Antarctic Sea ice using multivariate adaptive regression splines , 1993 .

[6]  Young Sun Kim,et al.  Screening test data analysis for liver disease prediction model using growth curve. , 2003, Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie.

[7]  J. Friedman Multivariate adaptive regression splines , 1990 .

[8]  B. Brenner,et al.  Hypertension: Pathophysiology, Diagnosis, and Management , 1994 .

[9]  B. Akdağ,et al.  Determination of risk factors for hypertension through the classification tree method , 2006, Advances in therapy.

[10]  Marshala Lee,et al.  Risk factors of hypertension and correlates of blood pressure and mean arterial pressure among patients receiving health exams at the Preventive Medicine Clinic, King Chulalongkorn Memorial Hospital, Thailand. , 2006, Journal of the Medical Association of Thailand = Chotmaihet thangphaet.

[11]  Charles B. Roosen,et al.  An introduction to multivariate adaptive regression splines , 1995, Statistical methods in medical research.

[12]  Guilbert Jj The world health report 2002 - reducing risks, promoting healthy life. , 2003 .

[13]  J. Friedman,et al.  Statistical techniques for the classification of chromites in diamond exploration samples , 1997 .

[14]  Jan A Staessen,et al.  Cardiovascular protection and blood pressure reduction: a meta-analysis , 2001, The Lancet.