Assessment of feature selection for student academic performance through machine learning classification

Abstract Regression analysis is used to find the trends in the data. The analysis helps to find the relationship between dependent and independent variables in the dataset. It also suggests the degree of influence of Independent variables towards the prediction of desired outcome. Multiple Linear Regression technique builds a model with more than one predictor by identifying the statistical relationship between them. This paper evaluates and analyzes the performance of multiple linear regression models and suggests a way to improve the model by Feature Selection. The performance of the model with and without backward elimination is analyzed for the Student Academic Performance dataset from Kaggle repository. The optimized model is experimented with various classifiers such as Logistic, KNN, Kernel SVM, Naïve Bayes, Decision Tree and Random Forest and its efficiency is assessed through metrics such as Precision, Recall, FScore and Accuracy.

[1]  Md. Apel Mahmud Isolated Area Load Forecasting using Linear Regression Analysis: Practical Approach , 2011 .

[2]  Hsin-Hung Wu,et al.  Using feature selection in identifying critical factors of injury severity , 2011 .

[3]  P. Nagabhushan,et al.  Knowledge Discovery in Learning Management System Using Piecewise Linear Regression , 2016 .

[4]  Maahi Tuahiru,et al.  Simple Linear Regression Model for Hidden/Hard-to-Reach/Elusive Populations , 2017 .

[5]  Luis M. Candanedo,et al.  Data driven prediction models of energy use of appliances in a low-energy house , 2017 .

[6]  K. Hamidieh A data-driven statistical model for predicting the critical temperature of a superconductor , 2018, Computational Materials Science.

[7]  K. Kawakami,et al.  Correlation between dental conditions and comorbidities in an elderly Japanese population , 2018, Medicine.

[8]  Faiq Mohammed Sarhan Al-Zwainy,et al.  Using Multivariable Linear Regression Technique for Modeling Productivity Construction in Iraq , 2013 .

[9]  Darrell N. Kotton,et al.  Figure 3 , 2008 .

[10]  J. Selbig,et al.  Understanding the Relationship between Cotton Fiber Properties and Non-Cellulosic Cell Wall Polysaccharides , 2014, PloS one.

[11]  Patrick R. McMullen Using Baseball Data as a Gentle Introduction to Teaching Linear Regression , 2015 .

[12]  M. Boeker,et al.  Impact of the Medical Faculty on Study Success in Freiburg: Results from Graduate Surveys , 2015, GMS Zeitschrift fur medizinische Ausbildung.

[13]  Xiaole Li,et al.  The Impact of IPO on the Secondary Stock Market—An Empirical Research , 2016 .