Ensembled Rule Based Classification Algorithms for predicting Imbalanced Kidney Disease Data

Imbalanced data is a type of data where there exists a difference in the ratio of classes. It occurs easily in real life of data analysis. In Data mining the functioning of learning algorithms caused by the imbalanced data. Most of the machine learning algorithms has a tendency to prejudice towards the class of majority in case of imbalanced data and hence those algorithms misjudge the minority class. Therefore, In this article we discuss a systematic way to address the imbalanced data classification problem by applying the rule based ensemble learning techniques like bagging, boosting, voting and stacking to build models, and then accelerates the performance of learning algorithms. In this research, we have preferred real data of chronic kidney disease which is collected from Appolo Hospitals, Tamil Nadu, India, to predict kidney disease of patients .The collected data is initially imbalanced. Firstly, the imbalanced data is balanced by applying SMOTE algorithm, which is an over sampling technique. Then applied various ensemble learning techniques to make better prediction. The incurred results showed that the model template chosen can minimize the problem of misclassification of imbalanced data efficaciously. But this model template cannot classify correctly when imbalanced rate of class increases i.e. in case of Big Data. For better result of imbalanced Big Data, new algorithmic plan of action has to be exploited which can be measured by using Hadoop framework and mapreduce programming model.

[1]  Sai Prasad Potharaju,et al.  An Improved Prediction of Kidney Disease using SMOTE , 2016 .

[2]  Anongnart Srivihok,et al.  Comparisons of classifier algorithms: Bayesian network, C4.5, decision forest and NBTree for Course Registration Planning model of undergraduate students , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[3]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[4]  Andrew Kusiak,et al.  Predicting survival time for kidney dialysis patients: a data mining approach , 2005, Comput. Biol. Medicine.

[5]  A. Umamakeswari,et al.  Performance Analysis of Various Data Mining Techniques in the Prediction of Heart Disease , 2015 .

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Singh Umesh Kumar,et al.  Data mining: Prediction for performance improvement of graduate students using classification , 2012, 2012 Ninth International Conference on Wireless and Optical Communications Networks (WOCN).

[9]  Dolly Gupta,et al.  Proposing Efficient Neural Network Training Model for Kidney Stone Diagnosis , 2012 .

[10]  Ashish Kumar Sen,et al.  A Data Mining Technique for Prediction of Coronary Heart Disease Using Neuro-Fuzzy Integrated Approach Two Level , 2013 .

[11]  Peter Bednár,et al.  A comparison of the bagging and the boosting methods using the decision trees classifiers , 2006, Comput. Sci. Inf. Syst..

[12]  Taghi M. Khoshgoftaar,et al.  A review of data mining using big data in health informatics , 2013, Journal Of Big Data.

[13]  Mosima Anna Masethe,et al.  Prediction Of Heart Disease Using Classification Algorithms , 2020 .

[14]  Jyotishman Pathak,et al.  Ensemble learning approaches to predicting complications of blood transfusion , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[15]  Liying Yang,et al.  Classifiers selection for ensemble learning based on accuracy and diversity , 2011 .

[16]  Xiu Li,et al.  Churn prediction with Linear Discriminant Boosting algorithm , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[17]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.