Analysis of Classification Techniques for Efficient Disease Prediction

Data mining plays an important role in processing large volumes of data. It refers to the process of obtaining knowledge from raw data. Classification is the most widely used data mining techniques, which employs some set of preclassified samples to develop a model called a classifier. Many researches showed that C4.5 algorithm need to be improvised to maximize accuracy, handle large amounts of data, where C5.0 is the improved version. The major goal of the classification technique is to predict the target class accurately for each case in the data. The main objective of this research work is to predict diseases using classification algorithms such as Decision trees, C5.0 and Bayesian Networks. The performance of classification algorithms is compared using the datasets, Breast cancer and Heart disease. The experimental results are compared based on different performance parameters like dataset scalability, accuracy and error rate values. The research shows that in terms of scalability Bayesian networks algorithm was proved to have more accuracy rate and less error rate than the C5.0 algorithm. General Terms Data Processing, Classification Algorithms