Chicken Swarm-Based Feature Subset Selection with Optimal Machine Learning Enabled Data Mining Approach

Data mining (DM) involves the process of identifying patterns, correlation, and anomalies existing in massive datasets. The applicability of DM includes several areas such as education, healthcare, business, and finance. Educational Data Mining (EDM) is an interdisciplinary domain which focuses on the applicability of DM, machine learning (ML), and statistical approaches for pattern recognition in massive quantities of educational data. This type of data suffers from the curse of dimensionality problems. Thus, feature selection (FS) approaches become essential. This study designs a Feature Subset Selection with an optimal machine learning model for Educational Data Mining (FSSML-EDM). The proposed method involves three major processes. At the initial stage, the presented FSSML-EDM model uses the Chicken Swarm Optimization-based Feature Selection (CSO-FS) technique for electing feature subsets. Next, an extreme learning machine (ELM) classifier is employed for the classification of educational data. Finally, the Artificial Hummingbird (AHB) algorithm is utilized for adjusting the parameters involved in the ELM model. The performance study revealed that FSSML-EDM model achieves better results compared with other models under several dimensions.

[1]  S. Babaie-Kafaki,et al.  Improved high-dimensional regression models with matrix approximations applied to the comparative case studies with support vector machines , 2022, Optim. Methods Softw..

[2]  Mohammed J. Zaki,et al.  20th International Workshop on Data Mining in Bioinformatics (BIOKDD 2021) , 2021, KDD.

[3]  R. Sridharan,et al.  Educational data mining for predicting students’ academic performance using machine learning algorithms , 2021, Materials Today: Proceedings.

[4]  Khaledun Nahar,et al.  Mining educational data to predict students performance , 2021, Education and Information Technologies.

[5]  Tiago Luís de ANDRADE,et al.  Active Methodology, Educational Data Mining and Learning Analytics: A Systematic Mapping Study , 2021, Informatics Educ..

[6]  Manish Pokharel,et al.  Educational data mining in moodle data , 2021 .

[7]  María Consuelo Sáiz-Manzanares,et al.  Monitoring of Student Learning in Learning Management Systems: An Application of Educational Data Mining Techniques , 2021, Applied Sciences.

[8]  Abdallah Shami,et al.  Systematic Ensemble Model Selection Approach for Educational Data Mining , 2020, Knowl. Based Syst..

[9]  M. Arashi,et al.  Generalized Cross-Validation for Simultaneous Optimization of Tuning Parameters in Ridge Regression , 2020 .

[10]  Kari Tammi,et al.  Recent Studies on Chicken Swarm Optimization algorithm: a review (2014–2018) , 2019, Artificial Intelligence Review.

[11]  Hosam Al-Samarraie,et al.  Educational data mining and learning analytics for 21st century higher education: A review and synthesis , 2019, Telematics Informatics.

[12]  Krishnaram Kenthapadi,et al.  Privacy-preserving Data Mining in Industry , 2019, WSDM.

[13]  Rommel N. Carvalho,et al.  Educational data mining: Predictive analysis of academic performance of public school students in the capital of Brazil , 2019, Journal of Business Research.

[14]  Zhengang Jiang,et al.  Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN , 2018, BioMedical Engineering OnLine.

[15]  Naren Ramakrishnan,et al.  Leveraging Propagation for Data Mining: Models, Algorithms and Applications , 2016, KDD.

[16]  Morteza Amini,et al.  Optimal partial ridge estimation in restricted semiparametric regression models , 2015, J. Multivar. Anal..

[17]  Muhammad Adnan Khan,et al.  An Improved Evolutionary Algorithm for Data Mining and Knowledge Discovery , 2022, Computers, Materials & Continua.

[18]  Majid Zaman,et al.  An Intelligent Prediction System for Educational Data Mining Based on Ensemble and Filtering approaches , 2020 .

[19]  Mahdi Roozbeh,et al.  Optimal QR-based estimation in partially linear regression models with correlated errors using GCV criterion , 2018, Comput. Stat. Data Anal..