Improving Confusion-State Classifier Model Using XGBoost and Tree-Structured Parzen Estimator

Detecting confusion has been considered as a critical issue in online education platforms. Confusion emerged as an effect of the limited interaction between lecturers and learners. The confusion detection machine learning model can be used to overcome the problem. Such a model can provide the ability for online education systems to detect confusion, thus it can react accordingly. Encouraged by the need, several studies have been done to develop confusion-state classifier models. The best previous model has an average accuracy of 75%. Despite having a promising result, the model still contains several gaps that can be improved. The gaps lie in the selection of the machine learning algorithm and the absence of any hyper-parameter optimization technique. This study aims to overcome them using two approaches: replacing the machine learning algorithm with XGBoost and applying the Tree-structured Parzen Estimator (TPE) as a hyper-parameter optimization technique. The TPE was also combined with the Recursive Feature Elimination (RFE) technique. The proposed model had outperformed the previous ones by achieving an average accuracy of 87%. This study also brought out the most optimal configuration of features and hyper-parameters to build such a model. This study had presented the current confusion-state classifier model.

[1]  T. Fernández,et al.  Broad Band Spectral Measurements of EEG During Emotional Tasks , 2001, The International journal of neuroscience.

[2]  Abdesselem Kortebi,et al.  On using eXtreme Gradient Boosting (XGBoost) Machine Learning algorithm for Home Network Traffic Classification , 2019, 2019 Wireless Days (WD).

[3]  J. Friedman Stochastic gradient boosting , 2002 .

[4]  Meng Zhao,et al.  Tuning the hyper-parameters of CMA-ES with tree-structured Parzen estimators , 2018, 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI).

[5]  Huajun Chen,et al.  Android Malware Classification using XGBoost based on Images Patterns , 2018, 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC).

[6]  Yan Wang,et al.  A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization , 2019, International Journal of Database Management Systems.

[7]  Gordon Thompson How Can Correspondence-Based Distance Education be Improved?: A Survey of Attitudes of Students Who Are Not Well Disposed toward Correspondence Study , 1990 .

[8]  Yuexing Peng,et al.  An improved XGBoost based on weighted column subsampling for object classification , 2017, 2017 4th International Conference on Systems and Informatics (ICSAI).

[9]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[10]  Xiaohui Zhao,et al.  A Comprehensive Study of Smartphone-Based Indoor Activity Recognition via Xgboost , 2019, IEEE Access.

[11]  Xinwei Zheng,et al.  Radar emitter classification for large data set based on weighted-xgboost , 2017 .

[12]  Yi Li,et al.  Application of XGBoost in Identification of Power Quality Disturbance Source of Steady-state Disturbance Events , 2019, 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC).

[13]  Weicong Kong,et al.  Effect of automatic hyperparameter tuning for residential load forecasting via deep learning , 2017, 2017 Australasian Universities Power Engineering Conference (AUPEC).

[14]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[15]  Wang XingFen,et al.  Research on User Consumption Behavior Prediction Based on Improved XGBoost Algorithm , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[16]  Dahai Zhang,et al.  A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGboost , 2018, IEEE Access.

[17]  Walter Daelemans,et al.  Combined Optimization of Feature Selection and Algorithm Parameters in Machine Learning of Language , 2003, ECML.

[18]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[19]  Xiaobo Sharon Hu,et al.  Using EEG to Improve Massive Open Online Courses Feedback Interaction , 2013, AIED Workshops.

[20]  Ying Jin,et al.  Recreating passenger mode choice-sets for transport simulation: A case study of London, UK , 2018 .

[21]  Eric P. Xing,et al.  Removing Confounding Factors Associated Weights in Deep Neural Networks Improves the Prediction Accuracy for Healthcare Applications , 2018, bioRxiv.

[22]  Sabine Vanhuysse,et al.  Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting , 2018, IEEE Geoscience and Remote Sensing Letters.

[23]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[24]  Ayon Dey,et al.  Machine Learning Algorithms: A Review , 2022, International Journal of Science and Research (IJSR).

[25]  Lei Xie,et al.  Confused or not Confused?: Disentangling Brain Activity from EEG Data Using Bidirectional LSTM Recurrent Neural Networks , 2017, BCB.