Prognosis Essay Scoring and Article Relevancy Using Multi-Text Features and Machine Learning

This study develops a model for essay scoring and article relevancy. Essay scoring is a costly process when we consider the time spent by an evaluator. It may lead to inequalities of the effort by various evaluators to apply the same evaluation criteria. Bibliometric research uses the evaluation criteria to find relevancy of articles instead. Researchers mostly face relevancy issues while searching articles. Therefore, they classify the articles manually. However, manual classification is burdensome due to time needed for evaluation. The proposed model performs automatic essay evaluation using multi-text features and ensemble machine learning. The proposed method is implemented in two data sets: a Kaggle short answer data set for essay scoring that includes four ranges of disciplines (Science, Biology, English, and English language Arts), and a bibliometric data set having IoT (Internet of Things) and non-IoT classes. The efficacy of the model is measured against the Tandalla and AutoP approach using Cohen’s kappa. The model achieves kappa values of 0.80 and 0.83 for the first and second data sets, respectively. Kappa values show that the proposed model has better performance than those of earlier approaches.

[1]  Gyu Sang Choi,et al.  Proving ground for social network analysis in the emerging research area “Internet of Things” (IoT) , 2016, Scientometrics.

[2]  J. Christie Automated Essay Marking – for both Style and Content , 1999 .

[3]  Peter W. Foltz,et al.  Identifying Patterns For Short Answer Scoring Using Graph-based Lexico-Semantic Text Matching , 2015, BEA@NAACL-HLT.

[4]  Tong Zhang,et al.  Learning Nonlinear Functions Using Regularized Greedy Forest , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Robert Williams,et al.  Automatically grading essays with markit , 2004 .

[6]  Youwei Wang,et al.  Novel feature selection method based on harmony search for email classification , 2015, Knowl. Based Syst..

[7]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[8]  Sanna Järvelä,et al.  Patterns in elementary school students′ strategic actions in varying learning situations , 2013 .

[9]  Yorick Wilks,et al.  A Closer Look at Skip-gram Modelling , 2006, LREC.

[10]  David Sánchez,et al.  An automatic approach for ontology-based feature extraction from heterogeneous textualresources , 2013, Eng. Appl. Artif. Intell..

[11]  Serkan Günal,et al.  A novel probabilistic feature selection method for text classification , 2012, Knowl. Based Syst..

[12]  Hao Xu,et al.  Assisting Instructor Assessment of Undergraduate Collaborative Wiki and SVN Activities , 2012, EDM.

[13]  Thomas G. Dietterich,et al.  Incorporating Boosted Regression Trees into Ecological Latent Variable Models , 2011, AAAI.

[14]  M. Christina Schneider,et al.  Applications of Automated Essay Evaluation in West Virginia , 2013 .

[15]  Antanas Verikas,et al.  Mining data with random forests: A survey and results of new tests , 2011, Pattern Recognit..

[16]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[17]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[18]  Matthew T. Schultz The IntelliMetric™ Automated Essay Scoring Engine – A Review and an Application to Chinese Essay Scoring , 2013 .

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[21]  Jill Burstein,et al.  Handbook of Automated Essay Evaluation Current Applications and New Directions , 2018 .

[22]  Stefano Soatto,et al.  Fast Human Pose Estimation using Appearance and Motion via Multi-Dimensional Boosting Regression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Mykola Pechenizkiy,et al.  Handbook of Educational Data Mining , 2010 .

[24]  William Wresch,et al.  The Imminence of Grading Essays by Computer-25 Years Later , 1993 .

[25]  Jesper Tegnér,et al.  Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[26]  Alejandro Peña-Ayala Review: Educational data mining: A survey and a data mining-based analysis of recent works , 2014 .

[27]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[28]  Sebastián Ventura,et al.  Classification via clustering for predicting final marks starting from the student participation in Forums , 2012, EDM.

[29]  Md. Monjurul Islam,et al.  Automated essay scoring using Generalized Latent Semantic Analysis , 2010, CIT 2010.

[30]  Upasana Pandey,et al.  A Survey on Text Classification Techniques for E-mail Filtering , 2010, 2010 Second International Conference on Machine Learning and Computing.

[31]  T. D. Wilson,et al.  Recent trends in user studies: action research and qualitative methods , 2000, Inf. Res..

[32]  Lawrence M. Rudner,et al.  Automated Essay Scoring Using Bayes' Theorem , 2002 .

[33]  J. Christie Automated essay marking - for style and content , 1999 .

[34]  Marian Petre,et al.  E-Assessment using Latent Semantic Analysis in the Computer Science Domain: A Pilot Study , 2004 .

[35]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[36]  Debbie A. Travers,et al.  Evaluation of preprocessing techniques for chief complaint classification , 2008, J. Biomed. Informatics.

[37]  Janice D. Gobert,et al.  Leveraging Educational Data Mining for Real-time Performance Assessment of Scientific Inquiry Skills within Microworlds , 2012, EDM 2012.

[38]  E. B. Page Computer Grading of Student Prose, Using Modern Concepts and Software , 1994 .

[39]  O. Mason,et al.  Automated free text marking with Paperless School , 2002 .

[40]  John Stillwell,et al.  Symmetry , 2000, Am. Math. Mon..

[41]  Hong-Bo Li,et al.  An improved random forest approach for detection of hidden web search interfaces , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[42]  Kaja Zupanc,et al.  Advances in the Field of Automated Essay Evaluation , 2016, Informatica.

[43]  Kent J. Crippen,et al.  Translating Current Science into Materials for High School via a Scientist–Teacher Partnership , 2014 .

[44]  Xue Li,et al.  Classifying text streams by keywords using classifier ensemble , 2011, Data Knowl. Eng..

[45]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[46]  D. Nicol E‐assessment by design: using multiple‐choice tests to good effect , 2007 .

[47]  Alejandro Peña Ayala,et al.  Educational data mining: A survey and a data mining-based analysis of recent works , 2014, Expert Syst. Appl..

[48]  Peter W. Foltz,et al.  The intelligent essay assessor: Applications to educational technology , 1999 .

[49]  Sally E. Jordan,et al.  A comparison of human and computer marking of short free-text student responses , 2010, Comput. Educ..

[50]  Paulo Oliveira,et al.  A system for formative assessment and monitoring of students' progress , 2014, Comput. Educ..

[51]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[52]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[53]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[54]  Tom Mitchell,et al.  Towards robust computerised marking of free-text responses , 2002 .

[55]  Simon J. Pittman,et al.  Multi-Scale Approach for Predicting Fish Species Distributions across Coral Reef Seascapes , 2011, PloS one.

[56]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[57]  Ben Hamner,et al.  Contrasting state-of-the-art automated scoring of essays: analysis , 2012 .