Efficient data preparation techniques for diabetes detection

The development of new medical devices and database management systems have created a huge number of databases in biomedical science. The success of data mining on medical databases in the medical data is often affected by incomplete, sparse and inexact parameters. If data is incorrect, irrelevant and noisy, then the final outcome will not be reliable. In particular, the conventional approaches for diabetes detection suffer from lack of attention to data preparation and ignoring the appropriate features. In this paper, some efficient data preparation techniques are study by which effective features are selected for reducing the cost and enhancing the accuracy of diabetes detection. The techniques are conducted by the popular diabetes data set PID.

[1]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[2]  Asma A. Al Jarullah Decision tree discovery for the diagnosis of type II diabetes , 2011, 2011 International Conference on Innovations in Information Technology.

[3]  Marguerite Summers,et al.  Evaluation of fourteen desktop data mining tools , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[4]  Chong Gu,et al.  Soft Classification, a. k. a. Risk Estimation, via Penalized Log Likelihood and Smoothing Spline Ana , 1993 .

[5]  A. H. Khan Multiplier-free feedforward networks , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Saudi Arabia,et al.  Decision Tree Discovery for the Diagnosis of Type II Diabetes , 2011 .

[8]  David H. Wolpert,et al.  The Mathematics of Generalization: The Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning , 1994 .

[9]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .