Basic Health Screening by Exploiting Data Mining Techniques

This study aimed at proposing a basic health screening system based on data mining techniques in order to help related personnel on basic health screening and to facilitate citizens on self-examining health conditions. The research comprised of two steps. The first step was to create a model by using classification techniques that are Bayesian methods (Naive Bayes, Bayesian networks, and Naive Bayesian Updateable) and decision tree methods (C4.5, ID3, Partial Rule) to find important attributes causing the disease. In this step, the accuracy of each method was compared to the other methods to select the most efficient model as an input for the next step. The second step was to develop a basic health screening system by exploiting rules from the model developed in the first step as the second step’s inputs were to classify from a citizen’s health profile whether a given citizen is in a normal group, risk group or sick group. Research findings revealed two important attributes directly contributing to diabetes: Blood pressure (BP) and docetaxel (DTX). Furthermore, C4.5 algorithm provided the most accuracy with accuracy of 99.7969%, precision of 99.8%, recall of 99.8% and F-measure of 99.8%.