A Benchmark Collection for Mapping Program Educational Objectives to ABET Student Outcomes: Accreditation

This research aims to present a collection of dataset, which represents the mapping of program education objectives to the ABET student outcomes. The dataset has been collected by the authors from 32 self-study reports from Engineering programs accredited by ABET, which are available online. The paper presents the constraints under which, the dataset was produced, because its understanding plays a vital role in the usage of this collection in future researches. To illustrate the properties and usefulness of the collection, the dataset has been cleansed, preprocessed, some features have been selected, then it has been benchmarked using nine of the widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to the other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. In general, promising results have been achieved. New research directions and baseline experimental results for future studies in educational data mining in general and in accreditation in specific have been provided.

[1]  Fabricio A. Breve,et al.  Combined unsupervised and semi-supervised learning for data classification , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[2]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[3]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[4]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[5]  Isha Shingari,et al.  A review of applications of data mining techniques for prediction of students’ performance in higher education , 2017 .

[6]  Cristina V. Lopes,et al.  Multi-Label Classification of Short Text: A Study on Wikipedia Barnstars , 2011, Analyzing Microtext.

[7]  Maya Ingle,et al.  Empirical Studies on Machine Learning Based Text Classification Algorithms , 2011 .

[8]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[10]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[11]  Andreas S. Weigend,et al.  Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[12]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Jian Yang,et al.  Low rank representation with adaptive distance penalty for semi-supervised subspace classification , 2017, Pattern Recognit..

[14]  S. H. Gawande,et al.  A Comparative Study on Different Types of Approaches to Text Categorization , 2012 .

[15]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[16]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[17]  Jürgen Börstler,et al.  Educational Data Mining and Learning Analytics in Programming: Literature Review and Case Studies , 2015, ITiCSE-WGR.

[18]  Ronald Dekker,et al.  The importance of having data-sets. , 2006 .

[19]  Sebastián Ventura,et al.  Educational data mining: A survey from 1995 to 2005 , 2007, Expert Syst. Appl..

[20]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[21]  Syed Abbas Ali,et al.  Analyzing undergraduate students' performance using educational data mining , 2017, Comput. Educ..

[22]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[23]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[24]  Addin Osman,et al.  Rocchio algorithm-based particle initialization mechanism for effective PSO classification of high dimensional data , 2017, Swarm Evol. Comput..

[25]  A. Peña,et al.  Educational data mining: a sample of review and study case , 2009 .

[26]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[27]  Zaidatun Tasir,et al.  Educational data mining: A review , 2013 .

[28]  Anastasios A. Economides,et al.  Learning Analytics and Educational Data Mining in Practice: A Systematic Literature Review of Empirical Evidence , 2014, J. Educ. Technol. Soc..

[29]  Wilhelmiina Hämäläinen,et al.  Comparison of Machine Learning Methods for Intelligent Tutoring Systems , 2006, Intelligent Tutoring Systems.

[30]  Shaghayegh Pezeshki Naraghi,et al.  Immunomodulatory Effect of Mesenchymal Stem Cells in Multiple Sclerosis and Experimental Autoimmune Encephalomyelitis: A Review Study , 2018, Immunoregulation.

[31]  Geoff Holmes,et al.  MEKA: A Multi-label/Multi-target Extension to WEKA , 2016, J. Mach. Learn. Res..