Improving construct validity yields better models of systematic inquiry, even with less information

Data-mined models often achieve good predictive power, but sometimes at the cost of interpretability. We investigate here if selecting features to increase a model's construct validity and interpretability also can improve the model's ability to predict the desired constructs. We do this by taking existing models and reducing the feature set to increase construct validity. We then compare the existing and new models on their predictive capabilities within a held-out test set in two ways. First, we analyze the models' overall predictive performance. Second, we determine how much student interaction data is necessary to make accurate predictions. We find that these reduced models with higher construct validity not only achieve better agreement overall, but also achieve better prediction with less data. This work is conducted in the context of developing models to assess students' inquiry skill at designing controlled experiments and testing stated hypotheses within a science inquiry microworld.

[1]  Ryan Shaun Joazeiro de Baker,et al.  Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction , 2005, Graphics Interface.

[2]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[3]  Arie Ben-David,et al.  About the relationship between ROC curves and Cohen's kappa , 2008, Eng. Appl. Artif. Intell..

[4]  D. Klahr,et al.  All other things being equal: acquisition and transfer of the control of variables strategy. , 1999, Child development.

[5]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Ryan Shaun Joazeiro de Baker,et al.  Leveraging machine-learned detectors of systematic inquiry behavior to estimate and predict transfer of inquiry skill , 2011, User Modeling and User-Adapted Interaction.

[8]  Ben-DavidArie About the relationship between ROC curves and Cohen's kappa , 2008 .

[9]  Barbara C. Buckley,et al.  Using Log Files to Track Students' Model-based Inquiry in Science , 2006, ICLS.

[10]  Marcia C. Linn,et al.  Helping students make controlled experiments more informative , 2010, ICLS.

[11]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[12]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[13]  Cristina Conati,et al.  Discovering and Recognizing Student Interaction Patterns in Exploratory Learning Environments , 2010, Intelligent Tutoring Systems.

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[17]  Janice D. Gobert,et al.  Leveraging Educational Data Mining for Real-time Performance Assessment of Scientific Inquiry Skills within Microworlds , 2012, EDM 2012.