Using cluster analysis for data mining in educational technology research

Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through two examples of mining click-stream server-log data that reflects student use of online learning environments. Cluster analysis can be used to help researchers develop profiles that are grounded in learner activity—like sequence for accessing tasks and information, or time spent engaged in a given activity or examining resources—during a learning session. The examples in this paper illustrate the use of a hierarchical clustering method (Ward’s clustering) and a non-hierarchical clustering method (k-Means clustering) to analyze characteristics of learning behavior while learners engage in a problem-solving activity in an online learning environment. A discussion of advantages and limitations of using cluster analysis as a data mining technique in educational technology research concludes the article.

[1]  M. Segers,et al.  Effects of problem-based learning: a meta- analysis , 2003 .

[2]  G. W. Milligan,et al.  Methodology Review: Clustering Methods , 1987 .

[3]  Alan H. Fielding,et al.  Cluster and Classification Techniques for the Biosciences , 2006 .

[4]  Jacqueline P. Leighton Avoiding Misconception, Misuse, and Missed Opportunities: The Collection of Verbal Reports in Educational Achievement Testing , 2005 .

[5]  Serkan Toy,et al.  Online ill -structured problem-solving strategies and their influence on problem-solving performance , 2007 .

[6]  M. Aldenderfer Cluster Analysis , 1984 .

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Kimberly A. Lawless,et al.  Understanding Hypertext Navigation through Cluster Analysis , 1996 .

[9]  Peggy A. Ertmer,et al.  Validity and Problem-Based Learning Research: A Review of Instruments Used to Assess Intended Learning Outcomes , 2016 .

[10]  Marija J. Norusis,et al.  SPSS 16.0 Statistical Procedures Companion , 2003 .

[11]  A Donner,et al.  The estimation of intraclass correlation in the analysis of family data. , 1980, Biometrics.

[12]  David H. Jonassen Learning to Solve Complex Scientific Problems , 2007 .

[13]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[14]  L. Cronbach,et al.  Assessing similarity between profiles. , 1953, Psychological bulletin.

[15]  Krista Glazewski,et al.  Problem-based learning and argumentation: testing a scaffolding framework to support middle school students’ creation of evidence-based arguments , 2011 .

[16]  Michael J. Hannafin,et al.  Scaffolding problem solving in technology-enhanced learning environments (TELEs): Bridging research and theory with practice , 2011, Comput. Educ..

[17]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[18]  John K. Jackman,et al.  Task Relevant Information in Engineering Problem Solving , 2006 .

[19]  P. G. Schrader,et al.  Dribble Files: Methodologies to Evaluate Learning and Performance in Complex Environments , 2007 .

[20]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[21]  David H. Jonassen,et al.  Instructional design models for well-structured and III-structured problem-solving learning outcomes , 1997 .

[22]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[23]  Sasha A. Barab,et al.  Hypermedia navigation: Profiles of hypermedia users , 1997 .

[24]  David H. Jonassen,et al.  Toward a design theory of problem solving , 2000 .

[25]  Ronald H. Stevens,et al.  Quantifying Students' Scientific Problem Solving Efficiency and Effectiveness , 2007 .

[26]  John Elder,et al.  Handbook of Statistical Analysis and Data Mining Applications , 2009 .

[27]  G. W. Milligan,et al.  A Review Of Monte Carlo Tests Of Cluster Analysis. , 1981, Multivariate behavioral research.

[28]  Cindy E. Hmelo-Silver,et al.  Productive use of learning resources in an online problem-based learning environment , 2010, Comput. Hum. Behav..

[29]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[30]  David F. Feldon,et al.  The Implications of Research on Expertise for Curriculum and Pedagogy , 2007 .