论文信息 - Identifying Feature Sequences from Process Data in Problem-Solving Items with N -Grams

Identifying Feature Sequences from Process Data in Problem-Solving Items with N -Grams

This article draws on process data from a computer-based large-scale program, the Programme for International Assessment of Adult Competencies (PIAAC), to address how sequences of actions recorded in problem-solving tasks are related to task performance and how feature sequences are identified for different groups. The purpose of this study is twofold: first, to explore and detect action sequence patterns of features that are associated with success or failure on a problem-solving item, and second, to mutually validate the results derived from two feature selection models. Motivated by the methodologies of natural language processing and text mining, we utilized n-gram model and two feature selection methods, chi-square statistic (CHI), and weighted log likelihood ratio test (WLLR), in analyzing the process data at a variety of aggregate levels. It was found that action sequence patterns significantly differed by performance groups and were consistent across countries. The two feature selection approaches resulted in a high agreement of feature identification.

Matthias von Davier | Qiwei He

[1] Heiko Rölke,et al. The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computer-based large-scale assessment. , 2014 .

[2] Johannes Naumann,et al. Assessing Individual Differences in Basic Computer Skills , 2013 .

[3] Jian Pei,et al. Sequence Data Mining , 2007, Advances in Database Systems.

[4] George Forman,et al. An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[5] Helene Fowkes,et al. A method based on the chi-square test for document classification , 2001, SIGIR '01.

[6] A. Agresti,et al. Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[7] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[8] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[9] Stephen E. Fienberg,et al. Discrete Multivariate Analysis: Theory and Practice , 1976 .

[10] Jimmy J. Lin,et al. Modeling actions of PubMed users with n-gram language models , 2009, Information Retrieval.

[11] Tzung-Pei Hong,et al. On Genetic-Fuzzy Data Mining Techniques , 2007 .

[12] Gernot A. Fink,et al. Markov Models for Pattern Recognition , 2014, Advances in Computer Vision and Pattern Recognition.

[13] Qiang Yang,et al. WhatNext: a prediction system for Web requests using n-gram sequence models , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[14] Andreas Schleicher,et al. Piaac: A New Strategy for Assessing Adult Competencies , 2008 .

[15] Qiwei He,et al. Screening for posttraumatic stress disorder using verbal features in self narratives: A text mining approach , 2012, Psychiatry Research.

[16] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[17] Matthias von Davier,et al. Analytics in International Large-Scale Assessments: Item Response Theory and Population Models , 2013 .

[18] Heather H. Mitchell,et al. AutoTutor: A tutor with dialogue in natural language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[19] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[20] Chu-Ren Huang,et al. A Framework of Feature Selection Methods for Text Categorization , 2009, ACL.

[21] Leslie Rutkowski,et al. Assessment Design for International Large-Scale Assessments , 2013 .

[22] M. Davier,et al. From Biology to Education: Scoring and Clustering Multilingual Text Sequences and Other Sequential. Research Report. ETS RR-12-25. , 2012 .

[23] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval , 1972 .

[24] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[25] Bernard P. Veldkamp,et al. Predicting self-monitoring skills using textual posts on Facebook , 2014, Comput. Hum. Behav..

[26] Sumalee Sonamthiang,et al. Discovering Hierarchical Patterns of Students' Learning Behavior in Intelligent Tutoring Systems , 2007 .

[27] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.