Recursive Feature Elimination with Ridge Regression (L2) Machine Learning Hybrid Feature Selection Algorithm for Diabetic Prediction using Random Forest Classifer.

In day today life, diabetes illness is increasing in count due to the body not able to metabolize the glucose level. The prediction of the right diabetes patients is an important research area that many researchers are proposing the techniques to predict this disease through data mining and machine learning methods. In prediction, feature selection is one of the key concept in preprocessing so that the features that are relevant to the disease will be used for prediction. This will improve the prediction accuracy. Selecting right features among the whole feature set is a complicated process and many researchers are concentrating on it to produce the predictive model with high accuracy. In this proposed work, the wrapper based feature selection method called Recursive Feature Elimination (RFE) is combined with Ridge regression (L2) to form a hybrid L2 regulated feature selection algorithm to overcome the overfilling problem of the data set. Over fitting is the major problem in feature selection which means that the new data are not fit to the model since the training data is small. Ridge regression is mainly used to overcome the overfitting problem. Once the features are selected using the proposed feature selection method, random forest classifier is used to classify the data based on the selected features. The proposed work is experimented in PIDD data set and the evaluated results are compared with the existing algorithms to prove the accuracy effect of the proposed algorithm. From the results obtained by proposed algorithm, the accuracy of predicting the diabetes disease is high compared to other existing algorithms.

[1]  B. H. Shekar,et al.  L1-Regulated Feature Selection in Microarray Cancer Data and Classification Using Random Forest Tree , 2019 .

[2]  Prathamesh Verlekar,et al.  Regularization and feature selection for large dimensional data , 2017 .

[3]  Eduard Alarcón,et al.  Machine learning-based network modeling: An artificial neural network model vs a theoretical inspired model , 2017, 2017 Ninth International Conference on Ubiquitous and Future Networks (ICUFN).

[4]  Javier Bilbao,et al.  Overfitting problem and the over-training in the era of data: Particularly for Artificial Neural Networks , 2017, 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS).

[5]  Ying Ju,et al.  Predicting Diabetes Mellitus With Machine Learning Techniques , 2018, Front. Genet..

[6]  A. Guergachi,et al.  Predictive models for diabetes mellitus using machine learning techniques , 2019, BMC Endocrine Disorders.

[7]  Matthieu Molinier,et al.  Avoiding Overfitting When Applying Spectral-Spatial Deep Learning Methods on Hyperspectral Images with Limited Labels , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[8]  Madam Chakradar A Machine Learning Based Approach for the Identification of Insulin Resistance with Non-Invasive Parameters using Homa-IR , 2020, International Journal of Emerging Trends in Engineering Research.

[9]  Lin Zhang,et al.  Overfitting and Underfitting Analysis for Deep Learning Based End-to-end Communication Systems , 2019, 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP).

[10]  O. Chapelle Multi-Class Feature Selection with Support Vector Machines , 2008 .

[11]  Sachidanand Singh,et al.  SVMRFE based approach for prediction of most discriminatory gene target for type II diabetes , 2017, Genomics data.

[12]  Li Lu,et al.  Feature Selection and Prediction Model for Type 2 Diabetes in the Chinese Population with Machine Learning , 2020, CSAE.

[13]  J. Vijayashree,et al.  AN EXPERT SYSTEM FOR THE DIAGNOSIS OF DIABETIC PATIENTS USING DEEP NEURAL NETWORKS AND RECURSIVE FEATURE ELIMINATION , 2017 .

[14]  Tiejun Tong,et al.  Gene Selection Using Iterative Feature Elimination Random Forests for Survival Outcomes , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Rajesh Lomte,et al.  Survey of Different Feature Selection Algorithms for Diabetes Mellitus Prediction , 2018, 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA).

[16]  David Zhang,et al.  Feature selection and analysis on correlated gas sensor data with recursive feature elimination , 2015 .

[17]  Nilesh B. Prajapati,et al.  Study of Diabetes Prediction using Feature Selection and Classification , 2014 .

[18]  Sujit Kumar,et al.  A Review on Feature Selection Algorithms , 2019, Emerging Research in Computing, Information, Communication and Applications.

[19]  Kemal Akyol,et al.  Diabetes Mellitus Data Classification by Cascading of Feature Selection Methods and Ensemble Learning Algorithms , 2018, International Journal of Modern Education and Computer Science.

[20]  Luc Van Gool,et al.  Texture Underfitting for Domain Adaptation , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[21]  H. M. Baskonus,et al.  Advances in Intelligent Systems and Computing , 2022, Smart Innovation, Systems and Technologies.