On the consistency of individual classification using short scales.

Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level, proportions of correct classifications were computed for varying test length, cut-scores, item scoring, and choices of item parameters. Short tests were found to classify at most 50% of a group consistently. Results were much better for tests containing 20 or 40 items. Small differences were found between dichotomous and polytomous (5 ordered scores) items. It is recommended that short tests for high-stakes decision making be used in combination with other information so as to increase reliability and classification consistency.

[1]  F. Deane,et al.  Development of a Short Form of the Test Anxiety Inventory (TAI) , 2002, The Journal of general psychology.

[2]  S. Folstein,et al.  "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician. , 1975, Journal of psychiatric research.

[3]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[4]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[5]  E. Goodman,et al.  Reliabilities of short substance abuse screening tests among adolescent medical patients. , 2000, Pediatrics.

[6]  H. Huynh ON THE RELIABILITY OF DECISIONS IN DOMAIN‐REFERENCED TESTING , 1976 .

[7]  F. Baker,et al.  Item response theory : parameter estimation techniques , 1993 .

[8]  Charles Lewis,et al.  Estimating the Consistency and Accuracy of Classifications Based on Test Scores , 1993 .

[9]  M. J. Subkoviak ESTIMATING RELIABILITY FROM A SINGLE ADMINISTRATION OF A CRITERION-REFERENCED TEST* , 1976 .

[10]  Kadriye Ercikan,et al.  Classification Accuracy of Assigning Student Performance to Proficiency Levels: Guidelines for Assessment Design , 2002 .

[11]  Ronald K. Hambleton,et al.  Reliability of Credentialing Examinations and the Impact of Scoring Models and Standard-Setting Policies , 1997 .

[12]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[13]  H. Eysenck Personality and prediction: Principles of personality assessment , 1990 .

[14]  R. J. Mokken,et al.  Handbook of modern item response theory , 1997 .

[15]  Johan Denollet,et al.  DS14: Standard Assessment of Negative Affectivity, Social Inhibition, and Type D Personality , 2005, Psychosomatic medicine.

[16]  Willem J. van der Linden,et al.  Linear Models for Optimal Test Design , 2005 .

[17]  R. Hare,et al.  Evaluating the Screening Version of the Hare Psychopathy Checklist—Revised (PCL:SV): An item response theory analysis. , 1999 .

[18]  R. Traub,et al.  Reliability of Test Scores and Decisions , 1980 .

[19]  Anton Beguin,et al.  Using Classical Test Theory in Combination with Item Response Theory , 2003 .

[20]  J. Donders Using a Short Form of the WISC-III: Sinful or Smart? , 2001, Child neuropsychology : a journal on normal and abnormal development in childhood and adolescence.

[21]  Peter Sandercock,et al.  Follow-up by mail in clinical trials: does questionnaire length matter? , 2004, Controlled clinical trials.

[22]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[23]  Mary Pommerich,et al.  Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses , 1995 .

[24]  A. Dowson,et al.  A six-item short-form survey for measuring headache impact: The HIT-6™ , 2003, Quality of Life Research.

[25]  D. Stuss,et al.  Do long tests yield a more accurate diagnosis of dementia than short tests? A comparison of 5 neuropsychological tests. , 1996, Archives of neurology.

[26]  P. Fayers Item Response Theory for Psychologists , 2004, Quality of Life Research.

[27]  R. Brennan,et al.  Test equating : methods and practices , 1995 .

[28]  F. Samejima Graded Response Model , 1997 .

[29]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[30]  Frederic M. Lord,et al.  Comparison of IRT True-Score and Equipercentile Observed-Score "Equatings" , 1984 .

[31]  N. Waller,et al.  Types of dissociation and dissociative types: A taxometric analysis of dissociative experiences , 1996 .

[32]  H. Pratt,et al.  Validation of short screening tests for depression and cognitive impairment in older medically ill inpatients , 2004, International journal of geriatric psychiatry.

[33]  Ronald K. Hambleton,et al.  RELIABILITY OF CRITERION‐REFERENCED TESTS: A DECISION‐THEORETIC FORMULATION , 1974 .

[34]  Steven P Reise,et al.  A Discussion of Modern Versus Traditional Psychometrics As Applied to Personality Assessment Scales , 2003, Journal of personality assessment.

[35]  J. Twisk,et al.  Comparison of Short Questionnaires on Alcohol Drinking Behavior in a Nonclinical Population of 36-Year-Old Men and Women , 2004, Substance use & misuse.

[36]  H. Taylor,et al.  The relationship of validity coefficients to the practical effectiveness of tests in selection: discussion and tables. , 1939 .

[37]  Klaas Sijtsma,et al.  A Taxonomy of IRT Models for Ordering Persons and Items Using Simple Sum Scores , 2000 .

[38]  S. Reise,et al.  Computerization and Adaptive Administration of the NEO PI-R , 2000, Assessment.