An introduction to item response theory and Rasch models for speech-language pathologists.

PURPOSE To present a primarily conceptual introduction to item response theory (IRT) and Rasch models for speech-language pathologists (SLPs). METHOD This tutorial introduces SLPs to basic concepts and terminology related to IRT as well as the most common IRT models. The article then continues with an overview of how instruments are developed using IRT and some basic principles of adaptive testing. CONCLUSION IRT is a set of statistical methods that are increasingly used for developing instruments in speech-language pathology. While IRT is not new, its application in speech-language pathology to date has been relatively limited in scope. Several new IRT-based instruments are currently emerging. IRT differs from traditional methods for test development, typically referred to as classical test theory (CTT), in several theoretical and practical ways. Administration, scoring, and interpretation of IRT instruments are different from methods used for most traditional CTT instruments. SLPs will need to understand the basic concepts of IRT instruments to use these tools in their clinical and research work. This article provides an introduction to IRT concepts drawing on examples from speech-language pathology.

[1]  Benjamin D. Wright,et al.  A History of Social Science Measurement , 2005 .

[2]  R. Gillam Review of Test of Adolescent/Adult Word Finding , 1995 .

[3]  A. Kertesz The Western Aphasia Battery , 1982 .

[4]  B. Muthén,et al.  How to Use a Monte Carlo Study to Decide on Sample Size and Determine Power , 2002 .

[5]  S. J. Sinclair,et al.  Item Response Theory and Computerized Adaptive Testing: Implications for Outcomes Measurement in Rehabilitation , 2005 .

[6]  C. Velozo,et al.  The Communicative Effectiveness Survey: investigating its item-level psychometric properties , 2007 .

[7]  K. Cook,et al.  Development of a Flexilevel Scale for use with computer-adaptive testing for assessing shoulder function. , 2005, Journal of shoulder and elbow surgery.

[8]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[9]  Clement A. Stone,et al.  Recovery of Marginal Maximum Likelihood Estimates in the Two-Parameter Logistic Response Model: An Evaluation of MULTILOG , 1992 .

[10]  J F Fries,et al.  The promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes. , 2005, Clinical and experimental rheumatology.

[11]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[12]  R. Darrell Bock,et al.  A Brief History of Item Theory Response , 2005 .

[13]  Jonathan M. Campbell,et al.  Peabody Picture Vocabulary Test , 2010 .

[14]  F. Samejima Graded Response Model , 1997 .

[15]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[16]  Frederick J. Gravetter,et al.  Statistics for the Behavioral Sciences [6th ed.] , 2004 .

[17]  Wendy M. Yen,et al.  Scaling Performance Assessments: Strategies for Managing Local Item Dependence , 1993 .

[18]  RON D. HAYS,et al.  Item Response Theory and Health Outcomes Measurement in the 21st Century , 2000, Medical care.

[19]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[20]  A. Jette,et al.  Contemporary measurement techniques for rehabilitation outcomes assessment. , 2005, Journal of rehabilitation medicine.

[21]  Joel Michell,et al.  Measurement: a beginner's guide. , 2003, Journal of applied measurement.

[22]  K. Cook,et al.  Dynamic assessment of health outcomes: time to let the CAT out of the bag? , 2005, Health services research.

[23]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[24]  K. Willmes A new look at the token test using probabilistic test models , 1981, Neuropsychologia.

[25]  G. Masters The Partial Credit Model , 2016 .

[26]  William D. Hula,et al.  Patient-reported cognitive and communicative functioning: 1 construct or 2? , 2010, Archives of physical medicine and rehabilitation.

[27]  M. Garrett,et al.  Initial validity and reliability of the SCCAN: using tailored testing to assess adult cognition and communication. , 2008, Journal of speech, language, and hearing research : JSLHR.

[28]  William Stout,et al.  A New Item Response Theory Modeling Approach with Applications to Unidimensionality Assessment and Ability Estimation , 1990 .

[29]  S. Embretson,et al.  Item response theory for psychologists , 2000 .

[30]  Some Psychometric Issues in Aphasia Therapy Research , 2003 .

[31]  J. Michell Quantitative science and the definition of measurement in psychology , 1997 .

[32]  L. Crocker,et al.  Introduction to Classical and Modern Test Theory , 1986 .

[33]  C. Fox,et al.  Applying the Rasch Model: Fundamental Measurement in the Human Sciences , 2001 .

[34]  Lori E. Skibbe,et al.  Measuring preschool attainment of print-concept knowledge: a study of typical and at-risk 3- to 5-year-old children using item response theory. , 2006, Language, speech, and hearing services in schools.

[35]  David J Weiss,et al.  Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) , 2007, Medical care.

[36]  Wendy M. Yen,et al.  Effects of Local Item Dependence on the Fit and Equating Performance of the Three-Parameter Logistic Model , 1984 .

[37]  S. Reise,et al.  Parameter Recovery in the Graded Response Model Using MULTILOG , 1990 .

[38]  N. Lackey,et al.  Making Sense of Factor Analysis , 2003 .

[39]  Scott E. Maxwell,et al.  Measurement and Statistics: An Examination of Construct Validity. , 1985 .

[40]  G. Goude,et al.  On fundamental measurement in psychology , 1962 .

[41]  J. Rosenbek,et al.  The communicative effectiveness survey: preliminary evidence of construct validity. , 2008, American journal of speech-language pathology.

[42]  Mark Wilson,et al.  The partial credit model and null categories , 1993 .

[43]  William D. Hula,et al.  A preliminary evaluation of the reliability and validity of a self‐reported communicative functioning item pool , 2009 .

[44]  D. Andrich Application of a Psychometric Rating Model to Ordered Categories Which Are Scored with Successive Integers , 1978 .

[45]  D. Andrich A rating formulation for ordered response categories , 1978 .

[46]  T. Christensen,et al.  Qualitative research and content validity: developing best practices based on science and experience , 2009, Quality of Life Research.

[47]  Everett V. Smith,et al.  Introduction to Rasch measurement : theory, models and applications , 2004 .

[48]  G. Masters A rasch model for partial credit scoring , 1982 .

[49]  K. Yorkston,et al.  Developing the communicative participation item bank: Rasch analysis results from a spasmodic dysphonia sample. , 2009, Journal of speech, language, and hearing research : JSLHR.

[50]  F. Floyd,et al.  Factor analysis in the development and refinement of clinical assessment instruments. , 1995 .

[51]  William D. Hula,et al.  Item response theory analysis of the Western Aphasia Battery , 2010 .

[52]  J. Linacre,et al.  Sample size and item calibration stability , 1994 .

[53]  Nan Rothrock,et al.  Evaluation of Item Candidates: The PROMIS Qualitative Item Review , 2007, Medical care.

[54]  Larry B. Wallnau,et al.  Statistics for the Behavioral Sciences , 1985 .

[55]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .