The Effects of Feature Optimization on High-Dimensional Essay Data

Current machine learning (ML) based automated essay scoring (AES) systems have employed various and vast numbers of features, which have been proven to be useful, in improving the performance of the AES. However, the high-dimensional feature space is not properly represented, due to the large volume of features extracted from the limited training data. As a result, this problem gives rise to poor performance and increased training time for the system. In this paper, we experiment and analyze the effects of feature optimization, including normalization, discretization, and feature selection techniques for different ML algorithms, while taking into consideration the size of the feature space and the performance of the AES. Accordingly, we show that the appropriate feature optimization techniques can reduce the dimensions of features, thus, contributing to the efficient training and performance improvement of AES.

[1]  Erkki Sutinen,et al.  Comparison of Dimension Reduction Methods for Automated Essay Grading , 2008, J. Educ. Technol. Soc..

[2]  S. Mohamed,et al.  Statistical Normalization and Back Propagation for Classification , 2022 .

[3]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Yonghong Yan,et al.  An Effective Automated Essay Scoring System Using Support Vector Regression , 2012, 2012 Fifth International Conference on Intelligent Computation Technology and Automation.

[5]  Ben He,et al.  A Ranked-Based Learning Approach to Automated Essay Scoring , 2012, 2012 Second International Conference on Cloud and Green Computing.

[6]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[7]  Zyad Shaaban,et al.  Data Mining: A Preprocessing Engine , 2006 .

[8]  Ellis B. Page,et al.  Statistical and Linguistic Strategies in the Computer Grading of Essays , 1967, COLING.

[9]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[10]  Bo Xu,et al.  Automated Chinese Essay Scoring using Vector Space Models , 2010, 2010 4th International Universal Communication Symposium.

[11]  Lawrence M. Rudner,et al.  Automated Essay Scoring Using Bayes' Theorem , 2002 .

[12]  Jerzy W. Grzymala-Busse,et al.  Global discretization of continuous attributes as preprocessing for machine learning , 1996, Int. J. Approx. Reason..

[13]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[14]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[15]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[16]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[17]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[18]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[19]  Jill Burstein,et al.  The E-rater® scoring engine: Automated essay scoring with natural language processing. , 2003 .

[20]  Peter W. Foltz,et al.  The intelligent essay assessor: Applications to educational technology , 1999 .

[21]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[22]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.