The K-Means Algorithm for Generating Sets of Items in Educational Assessment

In a national-scale educational assessment system, such as the National Examination, the need for several sets of questions that have the same level of difficulty is very required to avoid cheating by students. Therefore, the objective, which is to make a set of questions with the same level of difficulty automatically, is done in this research. It used a machine learning approach, namely K-Means. To achieve this goal, several following procedures need to be implemented. Firstly, we need to create banks of questions to be assigned to students. Then, we build training data by determining the value of each question based on Bloom's Taxonomy, item characters/types, and other parameters. Then, with utilizing K-Means, several cluster centers are obtained to represent the uniformity of the questions in the cluster members. By using several heuristics criteria defined previously, several sets or packages of questions that have the same characteristics and difficulty levels are obtained. From the experiments conducted, the analysis with descriptive (i.e., mean, standard deviation, and data visualization) and inference (i.e., ANOVA) statistics of results are presented showing that questions of each sets have the same characteristics to ensure the fairness of examinations. Moreover, by using this system, the contents of the questions in the generated set do not need to be the same, the package of questions can be generated automatically quickly, and the level of the difficulties can be measured and guaranteed.

[1]  Lala Septem Riza,et al.  Question Generator System of Sentence Completion in TOEFL Using NLP and K-Nearest Neighbor , 2019, Indonesian Journal of Science and Technology.

[2]  Jim Kurose,et al.  Computer Networking: A Top-Down Approach , 1999 .

[3]  Arpit Bansal,et al.  Improved K-mean Clustering Algorithm for Prediction Analysis using Classification Technique in Data Mining , 2017 .

[4]  Francisco Herrera,et al.  Implementing algorithms of rough set theory and fuzzy rough set theory in the R package "RoughSets" , 2014, Inf. Sci..

[5]  R. W. Tyler General Statement on Evaluation , 1942 .

[6]  Guan Yong,et al.  Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm , 2010, 2010 Third International Symposium on Intelligent Information Technology and Security Informatics.

[7]  M. James,et al.  Assessment and Learning: differences and relationships between formative and summative assessment , 1997 .

[8]  Trunal Patel Computer Network: A System Approach , 2012 .

[9]  Douglas E. Comer,et al.  Internetworking with TCP/IP - Principles, Protocols, and Architectures, Fourth Edition , 1988 .

[10]  Lala Septem Riza,et al.  Determining Trending Topics in Twitter with a Data-Streaming Method in R , 2019 .

[11]  Lala Septem Riza,et al.  A new approach on prediction of fever disease by using a combination of Dempster Shafer and Naïve bayes , 2016, 2016 2nd International Conference on Science in Information Technology (ICSITech).

[12]  Jaap Scheerens,et al.  Educational eveluation, assessment, and monitoring : a systemic approach , 2003 .

[13]  B. Bloom,et al.  Taxonomy of Educational Objectives. Handbook I: Cognitive Domain , 1966 .

[14]  Olivier Bonaventure,et al.  Computer Networking : Principles, Protocols and Practice , 2012 .

[15]  Lala Septem Riza,et al.  Detection of kidney disease using various intelligent classifiers , 2017, 2017 3rd International Conference on Science in Information Technology (ICSITech).

[16]  Lala Septem Riza,et al.  frbs: Fuzzy Rule-Based Systems for Classification and Regression in R , 2015 .

[17]  Lala Septem Riza,et al.  Determining Strategies on Playing Badminton using the Knuth-Morris-Pratt Algorithm , 2018 .

[18]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.