A Comparison of Educational Statistics and Data Mining Approaches to Identify Characteristics That Impact Online Learning.

Learning objects (LOs) are important online resources for both learners and instructors and usage for LOs is growing. Automatic LO tracking collects large amounts of metadata about individual students as well as data aggregated across courses, learning objects, and other demographic characteristics (e.g. gender). The challenge becomes identifying which of the many variables derived from tracked data are useful for predicting student learning. This challenge has prompted considerable research in the field of educational data mining and learning analytics. This work advances such research in four ways. First, we bring together two approaches for finding salient variables from separate research areas: hierarchical linear modeling (HLM) from education and Lasso feature selection from computer science. Second, we show that these two approaches have complimentary and synergistic results with some variables considers salient by both and others salient by only one. Third, and most importantly, we demonstrate the benefits of a combined approach that considers a variable salient when either HLM or Lasso consider that variable salient. This combined approach both improves model predictive accuracy and finds additional variables considered salient in previous datasets on student learning. Lastly, we use the results to provide insights into the salient variables to the learning outcome in undergraduate CS education. Overall, this work suggests a combined approach that improves the identification of salient variables in big data and also improves the design of LO tracking systems for learning management systems.

[1]  Lee Dee Miller,et al.  iLOG: A Framework for Automatic Annotation of Learning Objects with Empirical Usage Metadata , 2011, Int. J. Artif. Intell. Educ..

[2]  Marie Bienkowski,et al.  Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief , 2012 .

[3]  Justin K Davis,et al.  Bayesian feature selection for classification with possibly large number of classes , 2011 .

[4]  M. Credé,et al.  A meta-analytic review of the Motivated Strategies for Learning Questionnaire , 2011 .

[5]  Leen-Kiat Soh,et al.  A Learning Analytic Approach to Identify Attributes of Learners and Multimedia Instruction that Influence Learning , 2011 .

[6]  J. Kahn Multilevel modeling: overview and applications to research in counseling psychology. , 2011, Journal of counseling psychology.

[7]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[8]  Lee Dee Miller,et al.  Evaluating the use of learning objects in CS1 , 2011, SIGCSE.

[9]  Lee Dee Miller,et al.  Revising computer science learning objects from learner interaction data , 2011, SIGCSE '11.

[10]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[11]  Agathe Merceron,et al.  A Data Model to Ease Analysis and Mining of Educational Data , 2010, EDM.

[12]  Annemarie H. Hindman,et al.  Ecological contexts and early learning: Contributions of child, family, and classroom factors during Head Start, to literacy and mathematics growth through first grade , 2010 .

[13]  Christine Harrison,et al.  International Encyclopedia of Education 3rd Edition , 2010 .

[14]  Sally M Thomas International Encyclopedia of Education (3rd edition) , 2010 .

[15]  Sebastián Ventura,et al.  Mining Rare Association Rules from e-Learning Data , 2010, EDM.

[16]  Bruce M. McLaren,et al.  Supporting Collaborative Learning and E-Discussions Using Artificial Intelligence Techniques , 2010, Int. J. Artif. Intell. Educ..

[17]  R. Bhaskaran,et al.  A Study on Feature Selection Techniques in Educational Data Mining , 2009, ArXiv.

[18]  D. Seo,et al.  Effects of College Climate on Students’ Binge Drinking: Hierarchical Generalized Linear Model , 2009, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[19]  Leen-Kiat Soh,et al.  Empirical usage metadata in learning objects , 2009, 2009 39th IEEE Frontiers in Education Conference.

[20]  Lee Dee Miller,et al.  Intelligent Learning Object Guide (iLOG): A Framework for Automatic Empirically-Based Metadata Generation , 2009, AIED.

[21]  Susan T. Hibbard,et al.  Making treatment effect inferences from multiple-baseline data: The utility of multilevel modeling approaches , 2009, Behavior research methods.

[22]  Dursun Delen,et al.  Analysis of cancer data: a data mining approach , 2009, Expert Syst. J. Knowl. Eng..

[23]  Maria Grigoriadou,et al.  Interactive Problem Solving Support in the Adaptive Educational Hypermedia System MATHEMA , 2008, IEEE Transactions on Learning Technologies.

[24]  César Hervás-Martínez,et al.  Data Mining Algorithms to Classify Students , 2008, EDM.

[25]  Perry Halkitis,et al.  Analysis of HIV medication adherence in relation to person and treatment characteristics using hierarchical linear modeling. , 2008, AIDS patient care and STDs.

[26]  S. Stack,et al.  The Association of Suicide Rates with Individual-Level Suicide Attitudes: A Cross-National Analysis , 2008 .

[27]  Erik Duval,et al.  Relevance Ranking Metrics for Learning Objects , 2007, IEEE Transactions on Learning Technologies.

[28]  James A. Bovaird,et al.  On the use of multilevel modeling as an alternative to items analysis in psycholinguistic research , 2007, Behavior research methods.

[29]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[30]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[31]  David Rindskopf,et al.  Hierarchical Linear Modeling in Organizational Research , 2007 .

[32]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[33]  Leen-Kiat Soh,et al.  Design, Development, and Validation of Learning Objects , 2006 .

[34]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[35]  Ronan G. Reilly,et al.  Examining the role of self-regulated learning on introductory programming performance , 2005, ICER '05.

[36]  M. Zunzunegui,et al.  Growth trajectories are influenced by breast-feeding and infant health in an afro-colombian community. , 2005, The Journal of nutrition.

[37]  Antonio Terracciano,et al.  Hierarchical linear modeling analyses of the NEO-PI-R scales in the Baltimore Longitudinal Study of Aging. , 2005, Psychology and aging.

[38]  Philip R. Ventura,et al.  Identifying predictors of success for an objects-first CS1 , 2005, Comput. Sci. Educ..

[39]  Robin Mason,et al.  Online education using learning objects , 2004, Br. J. Educ. Technol..

[40]  F. Colmenares,et al.  Application of piecewise hierarchical linear growth modeling to the study of continuity in behavioral development of baboons (Papio hamadryas). , 2004, Journal of comparative psychology.

[41]  Frederick J. Gravetter,et al.  Statistics for the Behavioral Sciences [6th ed.] , 2004 .

[42]  Susan Wiedenbeck,et al.  Factors affecting course outcomes in introductory programming , 2004, PPIG.

[43]  V. Shute,et al.  Adaptive E-Learning , 2003, Educational Psychologist.

[44]  Guillermo A. Francia A Tale of Two Learning Objects , 2002 .

[45]  Janet Rountree,et al.  Predictors of success and failure in a CS1 course , 2002, SGCS.

[46]  R. Pickering,et al.  Statistical aspects of measurement in palliative care , 2002, Palliative medicine.

[47]  Catherine C. Chen,et al.  Self-Regulated Learning Strategies and Achievement in an Introduction to Information Systems Course , 2002 .

[48]  Brenda Cantwell Wilson,et al.  Contributing to success in an introductory computer science course: a study of twelve factors , 2001, SIGCSE '01.

[49]  Alfred Bork,et al.  Multimedia in Learning , 2001 .

[50]  David A. Hofmann,et al.  Centering Decisions in Hierarchical Linear Models: Implications for Research in Organizations , 1998 .

[51]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[52]  R. Bosker Boekbespreking van "A.S. Bryk & S.W. Raudenbusch - Hierarchical linear models: Applications and data analysis methods" : Sage Publications, Newbury Parki, London/New Delhi 1992 , 1995 .

[53]  Robert J. Sternberg,et al.  Mind in context : interactionist perspectives on human intelligence , 1994 .

[54]  P. Pintrich,et al.  Reliability and Predictive Validity of the Motivated Strategies for Learning Questionnaire (Mslq) , 1993 .

[55]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[56]  P. Pintrich A Manual for the Use of the Motivated Strategies for Learning Questionnaire (MSLQ). , 1991 .

[57]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[58]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.