Identifying Feature Sequences from Process Data in Problem-Solving Items with N -Grams

This article draws on process data from a computer-based large-scale program, the Programme for International Assessment of Adult Competencies (PIAAC), to address how sequences of actions recorded in problem-solving tasks are related to task performance and how feature sequences are identified for different groups. The purpose of this study is twofold: first, to explore and detect action sequence patterns of features that are associated with success or failure on a problem-solving item, and second, to mutually validate the results derived from two feature selection models. Motivated by the methodologies of natural language processing and text mining, we utilized n-gram model and two feature selection methods, chi-square statistic (CHI), and weighted log likelihood ratio test (WLLR), in analyzing the process data at a variety of aggregate levels. It was found that action sequence patterns significantly differed by performance groups and were consistent across countries. The two feature selection approaches resulted in a high agreement of feature identification.

[1]  Heiko Rölke,et al.  The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computer-based large-scale assessment. , 2014 .

[2]  Johannes Naumann,et al.  Assessing Individual Differences in Basic Computer Skills , 2013 .

[3]  Jian Pei,et al.  Sequence Data Mining , 2007, Advances in Database Systems.

[4]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[5]  Helene Fowkes,et al.  A method based on the chi-square test for document classification , 2001, SIGIR '01.

[6]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[7]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[8]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[9]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[10]  Jimmy J. Lin,et al.  Modeling actions of PubMed users with n-gram language models , 2009, Information Retrieval.

[11]  Tzung-Pei Hong,et al.  On Genetic-Fuzzy Data Mining Techniques , 2007 .

[12]  Gernot A. Fink,et al.  Markov Models for Pattern Recognition , 2014, Advances in Computer Vision and Pattern Recognition.

[13]  Qiang Yang,et al.  WhatNext: a prediction system for Web requests using n-gram sequence models , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[14]  Andreas Schleicher,et al.  Piaac: A New Strategy for Assessing Adult Competencies , 2008 .

[15]  Qiwei He,et al.  Screening for posttraumatic stress disorder using verbal features in self narratives: A text mining approach , 2012, Psychiatry Research.

[16]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[17]  Matthias von Davier,et al.  Analytics in International Large-Scale Assessments: Item Response Theory and Population Models , 2013 .

[18]  Heather H. Mitchell,et al.  AutoTutor: A tutor with dialogue in natural language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[19]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[20]  Chu-Ren Huang,et al.  A Framework of Feature Selection Methods for Text Categorization , 2009, ACL.

[21]  Leslie Rutkowski,et al.  Assessment Design for International Large-Scale Assessments , 2013 .

[22]  M. Davier,et al.  From Biology to Education: Scoring and Clustering Multilingual Text Sequences and Other Sequential. Research Report. ETS RR-12-25. , 2012 .

[23]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[24]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[25]  Bernard P. Veldkamp,et al.  Predicting self-monitoring skills using textual posts on Facebook , 2014, Comput. Hum. Behav..

[26]  Sumalee Sonamthiang,et al.  Discovering Hierarchical Patterns of Students' Learning Behavior in Intelligent Tutoring Systems , 2007 .

[27]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.