Experimental validation strategies for heterogeneous computer-based assessment items

Abstract Computer-based assessments open up new possibilities to measure constructs in authentic settings. They are especially promising to measure 21st century skills, as for instance information and communication technologies (ICT) skills. Items tapping such constructs may be diverse regarding design principles and content and thus form a heterogeneous item set. Existing validation approaches, as the construct representation approach by Embretson (1983), however, require homogenous item sets in the sense that a particular task characteristic can be applied to all items. To apply this validation rational also for heterogeneous item sets, two experimental approaches are proposed based on the idea to create variants of items by systematically manipulating task characteristics. The change -approach investigates whether the manipulation affects construct-related demands and the eliminate -approach whether the test score represents the targeted skill dimension. Both approaches were applied within an empirical study ( N  = 983) using heterogeneous items from an ICT skills test. The results show how changes of ICT-specific task characteristics influenced item difficulty without changing the represented construct. Additionally, eliminating the intended skill dimension led to easier items and changed the construct partly. Overall, the suggested experimental approaches provide a useful validation tool for 21st century skills assessed by heterogeneous items.

[1]  Andreas Frey,et al.  Konstruktvalidierung und Skalenbeschreibung in der Kompetenzdiagnostik durch die Vorhersage von Aufgabenschwierigkeiten , 2012 .

[2]  Alija Kulenović,et al.  Standards for Educational and Psychological Testing , 1999 .

[3]  Eckhard Klieme,et al.  Reporting results of large-scale assessment in psychologically and educationally meaningful terms: Construct validation and proficiency scaling in TIMSS. , 2002 .

[4]  Eran Chajut,et al.  You Can Teach Old Dogs New Tricks : The Factors That Affect Changes over Time in Digital Literacy , 2010 .

[5]  Stephen B. Dunbar,et al.  Complex, Performance-Based Assessment: Expectations and Validation Criteria , 1991 .

[6]  Claus H. Carstensen,et al.  Explanatory Item Response Models: A Brief Introduction , 2008 .

[7]  Francis Tuerlinckx,et al.  A Hierarchical IRT Model for Criterion-Referenced Measurement , 2000 .

[8]  J. Fraillon,et al.  Preparing for life in a digital age , 2014 .

[9]  Heiko Rölke The ItemBuilder: A Graphical Authoring System for Complex Item Development , 2012 .

[10]  Samuel Greiff,et al.  Easily too difficult: Estimating item difficulty in computer simulated microworlds , 2016, Comput. Hum. Behav..

[11]  Tom Krenzke,et al.  Literacy, Numeracy, and Problem Solving in Technology-Rich Environments among U.S. Adults: Results from the Program for the International Assessment of Adult Competencies 2012. First Look. NCES 2014-008. , 2013 .

[12]  M. Kane Validating the Interpretations and Uses of Test Scores , 2013 .

[13]  D. Borsboom,et al.  The concept of validity. , 2004, Psychological review.

[14]  April L. Zenisky,et al.  Innovative Item Formats in Computer-Based Testing: In Pursuit of Improved Construct Representation , 2006 .

[15]  S. Embretson CONSTRUCT VALIDITY: CONSTRUCT REPRESENTATION VERSUS NOMOTHETIC SPAN , 1983 .

[16]  Alexander van Deursen,et al.  Using the Internet: Skill related problems in users' online behavior , 2009, Interact. Comput..

[17]  G. H. Fischer,et al.  The linear logistic test model as an instrument in educational research , 1973 .

[18]  Oscar Valiente,et al.  PISA 2009 Results: Students On Line Digital Technologies and Performance (Volume VI) , 2011 .

[19]  L. F. Hornke,et al.  Rule-Based Item Bank Construction and Evaluation Within the Linear Logistic Framework , 1986 .

[20]  Johannes Naumann,et al.  Assessing Individual Differences in Basic Computer Skills , 2013 .

[21]  D. Watson,et al.  Constructing validity: Basic issues in objective scale development , 1995 .

[22]  Soo Young Rieh Judgement of information quality and cognitive authority in the Web , 2002 .

[23]  Candace L. Sidner,et al.  Email overload: exploring personal information management of email , 1996, CHI.

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  Alexander Robitzsch,et al.  Test Analysis Modules , 2015 .

[26]  Samuel Greiff,et al.  The systematic variation of task characteristics facilitates the understanding of task difficulty: A cognitive diagnostic modeling approach to complex problem solving , 2014 .

[27]  D. Bates,et al.  Linear Mixed-Effects Models using 'Eigen' and S4 , 2015 .

[28]  Abe D. Hofman,et al.  The estimation of item response models with the lmer function from the lme4 package in R , 2011 .

[29]  Douglas A. Bors,et al.  What does the Mental Rotation Test Measure? An Analysis of Item Difficulty and Item Characteristics , 2009 .

[30]  Yair Amichai-Hamburger,et al.  Experiments in Digital Literacy , 2004, Cyberpsychology Behav. Soc. Netw..

[31]  Wolfram Schulz,et al.  International Computer and Information Literacy Study: ICILS 2013: Technical Report , 2015 .

[32]  Johannes Naumann,et al.  Effects of linear reading, basic computer skills, evaluating online information, and navigation on reading digital text , 2016, Comput. Hum. Behav..

[33]  T. Richter,et al.  Eine revidierte Fassung des Inventars zur Computerbildung (INCOBI-R) , 2010 .

[34]  Ulrich Trautwein,et al.  Large-scale student assessment studies measure the results of processes of knowledge acquisition: Evidence in support of the distinction between intelligence and student achievement , 2009 .