On the Use of Offline Short Tests for Scoring and Classifying Purposes

I n response to the increasing interest in and need for a practical brief measure in language testing, this study explored the properties of an offline short-form test (OSF) versus a conventional lengthy test. From the total of 98 vocabulary items pooled from the Iranian National University Entrance Exams, 60 items were selected for the conventional test (CT). To build the OSF, we created an item bank by examining the item response theory (IRT) parameter estimates. Data for the IRT calibration included the responses of 774,258 examinees. Upon the results of the item calibration, 43 items with the highest discrimination power and minimal guessing values from different levels of ability were selected for the item bank. Then, using the responses of 253 EFL learners, we compared the measurement properties of the OSF scores with those of the CT scores in terms of the score precision, score comparability, and consistency of classification decisions. The results revealed that although the OSF generally did not achieve the same level of measurement precision as the CT, it still achieved a desired level of precision while

[1]  J. R. Drake Differentiation of Self Inventory - short form: creation and Initial evidence of construct validity , 2011 .

[2]  A. Dowson,et al.  Applications of computerized adaptive testing (CAT) to the assessment of headache impact , 2003, Quality of Life Research.

[3]  Otto B. Walter,et al.  Development and evaluation of a computer adaptive test for ‘Anxiety’ (Anxiety-CAT) , 2007, Quality of Life Research.

[4]  S. Gosling,et al.  A very brief measure of the Big-Five personality domains , 2003 .

[5]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[6]  Leslie Keng,et al.  A comparison of the performance of testlet-based computer adaptive tests and multistage tests , 2008 .

[7]  Bo Zhang Assessing the accuracy and consistency of language proficiency classification under competing measurement models , 2010 .

[8]  E. Doll A Brief Binet-Simon Scale , 1917, The Psychological clinic.

[9]  A. Jette,et al.  Short-form activity measure for post-acute care. , 2004, Archives of physical medicine and rehabilitation.

[10]  M. Edelen,et al.  Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement , 2007, Quality of Life Research.

[11]  S. Reise,et al.  Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms , 2010, Quality of Life Research.

[12]  Harrie C. M. Vorst,et al.  Computerized Adaptive Testing for Polytomous Motivation Items: Administration Mode Effects and a Comparison With Short Forms , 2007 .

[13]  dkk Donald Ary Introduction to research in education , 1972 .

[14]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[15]  C. Lewis,et al.  Use of NON-PARAMETRIC Item Response Theory to develop a shortened version of the Positive and Negative Syndrome Scale (PANSS) , 2011, BMC psychiatry.

[16]  Roger Bakeman,et al.  Detecting sequential patterns and determining their reliability with fallible observers. , 1997 .

[17]  C. Sherbourne,et al.  The MOS 36-Item Short-Form Health Survey (SF-36) , 1992 .

[18]  R. Harvey,et al.  Psychometric Properties of the Reidenbach–Robin Multidimensional Ethics Scale , 2007 .

[19]  A. Dowson,et al.  A six-item short-form survey for measuring headache impact: The HIT-6™ , 2003, Quality of Life Research.

[20]  A. Jette Assessing disability in studies on physical activity. , 2003, American journal of preventive medicine.

[21]  Kevin Terrance Petway Applying adaptive methods and classical scale reduction techniques to data from the Big Five Inventory , 2010 .

[22]  W. Balzer,et al.  ISSUES AND STRATEGIES FOR REDUCING THE LENGTH OF SELF‐REPORT SCALES , 2002 .

[23]  B. Zhang,et al.  Assessing the accuracy and consistency of language proficiency classification under competing measurement models , 2010 .

[24]  Jia-Hwa Wang,et al.  Using real-data simulations to compare computer adaptive testing and static short-form administrations of an upper extremity item bank , 2009 .

[25]  O. John,et al.  Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German , 2007 .

[26]  D. Weiss Computerized Adaptive Testing for Effective and Efficient Measurement in Counseling and Education , 2004 .

[27]  Karon F. Cook,et al.  Letting the CAT out of the Bag: Comparing Computer Adaptive Tests and an 11-Item Short Form of the Roland-Morris Disability Questionnaire , 2008, Spine.

[28]  P. Stratford,et al.  Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments. , 2005, Journal of clinical epidemiology.

[29]  William D. Schafer,et al.  measurement and evaluation in counseling and Development , 2013 .

[30]  M. Kosinski,et al.  Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care. , 2004, Archives of physical medicine and rehabilitation.

[31]  Z. Ying,et al.  a-Stratified Multistage Computerized Adaptive Testing , 1999 .

[33]  D. McCarthy,et al.  On the sins of short-form development. , 2000, Psychological assessment.

[34]  Z. Ying,et al.  a-Stratified Multistage Computerized Adaptive Testing with b Blocking , 2001 .

[35]  S. Reise,et al.  Computerization and Adaptive Administration of the NEO PI-R , 2000, Assessment.

[36]  R. Gershon,et al.  The future of outcomes measurement: item banking, tailored short-forms, and computerized adaptive assessment , 2007, Quality of Life Research.

[37]  Kristen G. Anderson,et al.  On the sins of short-form development. , 2000, Psychological assessment.

[38]  M. Johnson An investigation of stratification exposure control procedures in CATs using the generalized partial credit model , 2006 .

[39]  Linking Multidimensional Item Calibrations , 1996 .