Generating descriptive model for student dropout: a review of clustering approach

The implementation of data mining is widely considered as a powerful instrument for acquiring new knowledge from a pile of historical data, which is normally left unstudied. This data driven methodology has proven effective to improve the quality of decision-making in several domains such as business, medical and complex engineering problems. Recently, educational data mining (EDM) has obtained a great deal of attention among educational researchers and computer scientists. In general, publications in the field of EDM focus on understanding student types and targeted marketing, using both descriptive and predictive models to maximize student retention. Inspired by previous attempts, this paper aims to establish the clustering approach as a practical guideline to explore student categories and characteristics, with the working example on a real dataset to illustrate analytical procedures and results.

[1]  Kenneth R. Koedinger,et al.  An Open Repository and analysis tools for fine-grained, longitudinal learner data , 2008, EDM.

[2]  RomeroCristobal,et al.  Data mining in education , 2013 .

[3]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Shieu-Hong Lin Data mining for student retention management , 2012 .

[5]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[6]  Chun Che Fung,et al.  Neural network modeling for an intelligent recommendation system supporting SRM for universities in Thailand , 2012 .

[7]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[8]  Tossapon Boongoen,et al.  LCE: a link-based cluster ensemble method for improved gene expression data analysis , 2010, Bioinform..

[9]  Daniel A. Ashlock,et al.  MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering , 2009, BMC Bioinformatics.

[10]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[12]  Qiang Yang,et al.  Discriminatively regularized least-squares classification , 2009, Pattern Recognit..

[13]  Tossapon Boongoen,et al.  Nearest-Neighbor Guided Evaluation of Data Reliability and Its Applications , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[15]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Yue Zhang,et al.  Cluster Analysis on Symptoms and Signs of Traditional Chinese Medicine in 815 Patients with Unstable Angina , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[17]  Paul Baepler,et al.  Academic Analytics and Data Mining in Higher Education , 2010 .

[18]  M. Deberard,et al.  Predictors of Academic Achievement and Retention among College Freshmen: A Longitudinal Study. , 2004 .

[19]  Senol Zafer Erdogan,et al.  A DATA MINING APPLICATION IN A STUDENT DATABASE , 2005 .

[20]  Sebastián Ventura,et al.  Data mining in education , 2013, WIREs Data Mining Knowl. Discov..

[21]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[22]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[23]  Sotiris B. Kotsiantis,et al.  PREDICTING STUDENTS' PERFORMANCE IN DISTANCE LEARNING USING MACHINE LEARNING TECHNIQUES , 2004, Appl. Artif. Intell..

[24]  Dejan Juric,et al.  Functional network analysis reveals extended gliomagenesis pathway maps and three novel MYC-interacting genes in human gliomas. , 2005, Cancer research.

[25]  Tossapon Boongoen,et al.  Extending Data Reliability Measure to a Filter Approach for Soft Subspace Clustering , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[28]  Dursun Delen,et al.  Predicting Student Attrition with Data Mining Methods , 2011 .

[29]  Ruthaychonnee Sittichai Why are there dropouts among university students? Experiences in a Thai University , 2012 .

[30]  Jack Mostow,et al.  Some useful tactics to modify, map and mine data from intelligent tutors , 2006, Natural Language Engineering.

[31]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[32]  Ruey-Shun Chen,et al.  Data Mining Application in Customer Relationship Management of Credit Card Business , 2005, COMPSAC.

[33]  Sung-Hyuk Cha,et al.  Constructing Binary Decision Trees using Genetic Algorithms , 2008, GEM.

[34]  Manoj Bala,et al.  STUDY OF APPLICATIONS OF DATA MINING TECHNIQUES IN EDUCATION , 2012 .

[35]  D. Henry,et al.  Cluster analysis in family psychology research. , 2005, Journal of family psychology : JFP : journal of the Division of Family Psychology of the American Psychological Association.

[36]  Elliot Maltz,et al.  Expanding the role of institutional research at small private universities: A case study in enrollment management using data mining , 2006 .

[37]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[38]  Samuel DiGangi,et al.  A Data Mining Approach for Identifying Predictors of Student Retention from Sophomore to Junior Year , 2021, Journal of Data Science.

[39]  Kyoung-jae Kim,et al.  A recommender system using GA K-means clustering in an online shopping market , 2008, Expert Syst. Appl..

[40]  R. S. Bichkar,et al.  Performance Prediction of Engineering Students using Decision Trees , 2011 .

[41]  S Mallika,et al.  UNCERTAINTY MODELLING AND LIMIT STATE RELIABILITY OF TUNNEL SUPPORTS UNDER SEISMIC EFFECTS , 2012 .

[42]  R. R.Kabra,et al.  Performance Prediction of Engineering Students using Decision Trees , 2011 .

[43]  R. Bhaskaran,et al.  A CHAID Based Performance Prediction Model in Educational Data Mining , 2010, ArXiv.

[44]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.