Practical Issues in Equating

Many of the practical issues that are involved in conducting equating are described in this chapter. We describe major issues and provide references that consider these issues in more depth. The early portions of this chapter focus on equating dichotomously scored paper-and-pencil tests. In later portions, the focus broadens to include practical issues in other contexts, including computerized testing and performance assessments. Various articles have been written that review practical issues in equating (Brennan and Kolen, 1987b; Cook and Petersen, 1987; Harris, 1993; Harris and Crouse, 1993; Skaggs, 1990a; and Skaggs and Lissitz, 1986b) in greater depth than those provided in this chapter.

[1]  M. J. Kolen Population Invariance in Equating and Linking: Concept and History , 2004 .

[2]  Harold F. O'Neil,et al.  Effects of Motivational Interventions on the National Assessment of Educational Progress Mathematics Performance , 1995 .

[3]  D. Whitney,et al.  Comparison of Four Procedures for Equating the Tests of General Educational Development. , 1982 .

[4]  R. Tate Equating for Long-Term Scale Maintenance of Mixed Format Tests Containing Multiple Choice and Constructed Response Items , 2003 .

[5]  Gautam Puhan Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program , 2008 .

[6]  Martha L. Stocking,et al.  Practical Issues in Large-Scale Computerized Adaptive Testing , 1996 .

[7]  Avi Allalouf,et al.  Quality Control Procedures in the Scoring, Equating, and Reporting of Test Scores , 2007 .

[8]  Samuel A. Livingston Small‐Sample Equating With Log‐Linear Smoothing , 1993 .

[9]  Mark D. Reckase,et al.  Effect of the Medium of Item Presentation on Examinee Performance and Item Characteristics , 1989 .

[10]  Samuel A. Livingston ADJUSTING SCORES ON EXAMINATIONS OFFERING A CHOICE OF ESSAY QUESTIONS , 1988 .

[11]  An NCME Instructional Module on Population Invariance in Linking and Equating , 2012 .

[12]  Sooyeon Kim,et al.  Linking Mixed‐Format Tests Using Multiple‐Choice Anchors , 2010 .

[13]  P. Holland,et al.  The Missing Data Assumptions of the NEAT Design and their Implications for Test Equating , 2010 .

[14]  Christine E. DeMars,et al.  Investigating the Impact of Compromised Anchor Items on IRT Equating Under the Nonequivalent Anchor Test Design , 2012 .

[15]  N. Longford Reliability of Essay Rating and Score Adjustment , 1994 .

[16]  Ronald K. Hambleton,et al.  Customized Tests and Customized Norms. , 1991 .

[17]  Samuel A. Livingston,et al.  What Combination of Sampling and Equating Methods Works Best , 1989 .

[18]  Nancy L. Allen,et al.  A MISSING DATA APPROACH TO ESTIMATING DISTRIBUTIONS OF SCORES FOR OPTIONAL TEST SECTIONS , 1994 .

[19]  W. Angoff Technical and Practical Issues in Equating: A Discussion of Four Papers , 1987 .

[20]  K. Ricker,et al.  SINGLE- VERSUS DOUBLE-SCORING OF TREND RESPONSES IN TREND SCORE EQUATING WITH CONSTRUCTED-RESPONSE TESTS , 2010 .

[21]  N. Petersen,et al.  A Test of the Adequacy of Curvilinear Score Equating Models , 1983 .

[22]  P. Holland,et al.  THE CORRELATION BETWEEN THE SCORES OF A TEST AND AN ANCHOR TEST , 2006 .

[23]  B. Bridgeman,et al.  THE EFFECT OF COMPUTER-BASED TESTS ON RACIAL/ETHNIC, GENDER, AND LANGUAGE GROUPS , 2000 .

[24]  M. Lunz,et al.  Equating Computerized Adaptive Certification Examinations: The Board of Registry Series of Studies. , 1995 .

[25]  H. Huynh,et al.  Contextual Characteristics of Locally Dependent Open-Ended Item Clusters in a Large-Scale Performance Assessment , 1997 .

[26]  G. Engelhard,et al.  The Effects of Task Choice on the Quality of Writing Obtained in a Statewide Assessment , 1995 .

[27]  Cynthia G. Parshall,et al.  Equating Error and Statistical Bias in Small Sample Linear Equating , 1995 .

[28]  Betty A. Bergstrom,et al.  An Empirical Study of Computerized Adaptive Test Administration Conditions. , 1994 .

[29]  Mary Pommerich,et al.  Developing Computerized Versions of Paper-and-Pencil Tests: Mode Effects for Passage-Based Tests , 2004 .

[30]  Stephen G. Sireci,et al.  The Impact of Multidirectional Item Parameter Drift on IRT Scaling Coefficients and Proficiency Estimates , 2012 .

[31]  Rick Morgan,et al.  EXPERIMENTAL STUDY OF THE EFFECTS OF CALCULATOR USE ON THE ADVANCED PLACEMENT CALCULUS EXAMINATIONS1 , 1991 .

[32]  George Engelhard,et al.  Evaluating Rater Accuracy in Performance Assessments. , 1996 .

[33]  Mary Pommerich,et al.  The Effect of Using Item Parameters Calibrated from Paper Administrations in Computer Adaptive Test Administrations , 2007 .

[34]  Hong Jiao,et al.  Comparability of Computer-Based and Paper-and-Pencil Testing in K–12 Reading Assessments , 2008 .

[35]  Deborah J. Harris,et al.  A Study of Criteria Used in Equating , 1993 .

[36]  On bias in linear observed-score equating , 2010 .

[37]  Tianyou Wang,et al.  Evaluating Comparability in Computerized Adaptive Testing: Issues, Criteria and an Example , 2001 .

[38]  D. Eignor AN INVESTIGATION OF THE FEASIBILITY AND PRACTICAL OUTCOMES OF PRE‐EQUATING THE SAT VERBAL AND MATHEMATICAL SECTIONS1,2,3 , 1985 .

[39]  Xiang-bo Wang On the Viability of Some Untestable Assumptions in Equating Exams That Allow Examinee Choice. Program Statistics Research Technical Report No. 93-31. , 1993 .

[40]  D. Budescu Selecting an Equating Method: Linear or Equipercentile? , 1987 .

[41]  B. Bridgeman,et al.  Choice Among Essay Topics: Impact on Performance and Validity , 1997 .

[42]  Bradley A. Hanson A Comparison of Presmoothing and Postsmoothing Methods in Equipercentile Equating. ACT Research Report Series 94-4. , 1994 .

[43]  Henry Braun,et al.  Understanding Scoring Reliability: Experiments in Calibrating Essay Readers , 1988 .

[44]  Brian D. Bontempo,et al.  Repeater Patterns on NCLEX™ using CAT versus NCLEX™ using Paper-and-Pencil Testing , 1996 .

[45]  N. Dorans,et al.  Equating Test Scores: Toward Best Practices , 2009 .

[46]  Multiple Linking in Equating and Random Scale Drift , 2011 .

[47]  R. Jaeger SOME EXPLORATORY INDICES FOR SELECTION OF A TEST EQUATING METHOD , 1981 .

[48]  P. Holland,et al.  Linking and aligning scores and scales , 2007 .

[49]  H. Huynh,et al.  Computer-Based and Paper-and-Pencil Administration Mode Effects on a Statewide End-of-Course English Test , 2008 .

[50]  Harold F. O'Neil,et al.  Policy and validity prospects for performance-based assessment. , 1993 .

[51]  Susan R. Goldman,et al.  Evaluation of Procedure-Based Scoring for Hands-On Science Assessment , 1992 .

[52]  Insu Paek,et al.  An Alternative to the Trend Scoring Method for Adjusting Scoring Shifts in Mixed-Format Tests , 2009 .

[53]  B. Clauser,et al.  The Impact of Statistically Adjusting for Rater Effects on Conditional Standard Errors of Performance Ratings , 2011 .

[54]  George Engelhard,et al.  Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model , 1994 .

[55]  A Discussion of Population Invariance of Equating , 2008 .

[56]  Walter D. Way Protecting the Integrity of Computerized Testing Item Pools , 1998 .

[57]  P. Holland,et al.  Observed Score Equating Using a Mini-Version Anchor and an Anchor with Less Spread of Difficulty: A Comparison Study , 2011 .

[58]  Akihito Kamata,et al.  The Performance of a Method for the Long‐term Equating of Mixed‐Format Assessment , 2005 .

[59]  Linda L. Cook,et al.  Simulation Results of Effects on Linear and Curvilinear Observed-and True-Score Equating Procedures of Matching on a Fallible Criterion , 1990 .

[60]  Deborah J. Harris,et al.  Comparison of Item Preequating and Random Groups Equating Using IRT and Equipercentile Methods , 1990 .

[61]  Gerald C. Davison,et al.  American Psychological Association (APA) , 2015 .

[62]  Gautam Puhan Impact of Inclusion or Exclusion of Repeaters on Test Equating , 2011 .

[63]  Anthony R. Zara,et al.  A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests. , 1991 .

[64]  David M. Williamson,et al.  A Framework for Evaluation and Use of Automated Scoring , 2012 .

[65]  D. Jarjoura,et al.  THE IMPORTANCE OF CONTENT REPRESENTATION FOR COMMON‐ITEM EQUATING WITH NONRANDOM GROUPS , 1985 .

[66]  Michael E. Walker,et al.  Score Linking Issues Related to Test Content Changes , 2007 .

[67]  Sooyeon Kim,et al.  Examining Two Strategies to Link Mixed-Format Tests Using Multiple-Choice Anchors. Research Report. ETS RR-10-18. , 2010 .

[68]  Paul W. Holland,et al.  Statistical models for test equating, scaling, and linking , 2011 .

[69]  M. J. Kolen Threats to Score Comparability with Applications to Performance Assessments and Computerized Adaptive Tests , 1999 .

[70]  R. Brennan,et al.  A Reply to Angoff , 1987 .

[71]  D. Budescu EFFICIENCY OF LINEAR EQUATING AS A FUNCTION OF THE LENGTH OF THE ANCHOR TEST , 1985 .

[72]  Nancy S. Petersen Equating: Best Practices and Challenges to Best Practices , 2007 .

[73]  R. Brennan A Discussion of Population Invariance , 2008 .

[74]  G. Neuman,et al.  Computerization of Paper-and-Pencil Tests: When are They Equivalent? , 1998 .

[75]  Samuel A. Livingston,et al.  An Evaluation of the Kernel Equating Method: A Special Study with Pseudotests Constructed from Real Test Data. Research Report. ETS RR-06-02. , 2006 .

[76]  Cornelis A.W. Glas,et al.  Computerized adaptive testing : theory and practice , 2000 .

[77]  S. Haberman,et al.  Small-Sample Equating Using a Synthetic Linking Function. , 2008 .

[78]  Mark D. Reckase,et al.  TECHNICAL GUIDELINES FOR ASSESSING COMPUTERIZED ADAPTIVE TESTS , 1984 .

[79]  The Effectiveness of Circular Equating as a Criterion for Evaluating Equating , 2000 .

[80]  Samuel A. Livingston,et al.  Comparisons among Small Sample Equating Methods in a Common‐Item Design , 2010 .

[81]  Neal M. Kingston Comparability of Computer- and Paper-Administered Multiple-Choice Tests for K–12 Populations: A Synthesis , 2008 .

[82]  Linda L. Cook,et al.  Sensitivity of Equating Results to Different Sampling Strategies. , 1990 .

[83]  Assessing Equating Results on Different Equating Criteria , 2005 .

[84]  Martha L. Stocking Revising Item Responses in Computerized Adaptive Tests: A Comparison of Three Models , 1997 .

[85]  Willem J. van der Linden,et al.  Local Observed-Score Equating , 2009 .

[86]  Neal M. Kingston,et al.  Item Location Effects and Their Implications for IRT Equating and Adaptive Testing , 1984 .

[87]  W. D. Linden,et al.  Local linear observed-score equating , 2011 .

[88]  Walter P. Vispoel,et al.  Reviewing and Changing Answers on Computer‐adaptive and Self‐adaptive Vocabulary Tests , 1998 .

[89]  Dorothy T. Thayer,et al.  The Chain and Post‐Stratification Methods for Observed‐Score Equating: Their Relationship to Population Invariance , 2004 .

[90]  Leonard S. Cahen,et al.  Educational Testing Service , 1970 .

[91]  Warren W. Willingham,et al.  Testing handicapped people , 1988 .

[92]  Tianyou Wang,et al.  Computerized Adaptive and Fixed‐Item Testing of Music Listening Skill: A Comparison of Efficiency, Precision, and Concurrent Validity , 1997 .

[93]  Richard L. Tate A Cautionary Note on IRT-Based Linking of Tests With Polytomous Items , 1999 .

[94]  George Leckie,et al.  Rater Effects on Essay Scoring: A Multilevel Analysis of Severity Drift, Central Tendency, and Rater Experience , 2011 .

[95]  N. Dorans Recentering and Realigning the SAT Score Distributions: How and Why. , 2002 .

[96]  N. Dorans Using Subpopulation Invariance to Assess Test Score Equity , 2004 .

[97]  Neil J. Dorans,et al.  THE EFFECTS OF ITEM REARRANGEMENT ON TEST PERFORMANCE: A REVIEW OF THE LITERATURE , 1982 .

[98]  Frederic M. Lord,et al.  Comparison of IRT True-Score and Equipercentile Observed-Score "Equatings" , 1984 .

[99]  Fritz Drasgow,et al.  Innovations in Computerized Assessment , 1999 .

[100]  USING REPEATERS FOR ESTIMATING COMPARABLE SCORES , 1999 .

[101]  Brian Rothschild,et al.  Effects of Extended Time on the SAT I: Reasoning Test Score Growth for Students with Learning Disabilities , 1998 .

[102]  G. C. Bussolino,et al.  Long-term performance of a transfer standard pyrometer , 1990 .

[103]  Marie Wiberg,et al.  Observed Score Linear Equating with Covariates , 2011 .

[104]  Howard Wainer,et al.  SOME PRACTICAL CONSIDERATIONS WHEN CONVERTING A LINEARLY ADMINISTERED TEST TO AN ADAPTIVE FORMAT , 1992 .

[105]  John Mazzeo Comparability of Computer and Paper-and-Pencil Scores for Two CLEP General Examinations. College Board Report No. 91-5. , 1991 .

[106]  Milton H Maier,et al.  Military Aptitude Testing: The Past Fifty Years , 1993 .

[107]  Gary L. Thomasson The Goal of Equity within and between Computerized Adaptive Tests and Paper and Pencil Forms. , 1997 .

[108]  F. Vijver,et al.  The incomplete equivalence of the paper-and-pencil and computerized versions of the General Aptitude Test Battery , 1994 .

[109]  Samuel A. Livingston,et al.  Collateral Information for Equating in Small Samples: A Preliminary Investigation , 2011 .

[110]  Robert W. Lissitz,et al.  IRT Test Equating: Relevant Issues and a Review of Recent Research , 1986 .

[111]  S. Sireci,et al.  Evaluating the Comparability of Paper- and Computer-Based Science Tests across Sex and SES Subgroups. , 2012 .

[112]  T. Hsu,et al.  Exploring the Feasibility of Collateral Information Test Equating , 2002 .

[113]  Anne L. Harvey,et al.  The Equivalence of Scores from Automated and Conventional Educational and Psychological Tests: A Review of the Literature. College Board Report No. 88-8. , 1988 .

[114]  Ronald K. Hambleton,et al.  Consequences of Violated Equating Assumptions Under the Equivalent Groups Design , 2011 .

[115]  M. J. Kolen,et al.  The Effect of Repeaters on Equating , 2010 .

[116]  A. A. Davier Potential Solutions to Practical Equating Issues , 2007 .

[117]  Wim J. van der Linden,et al.  Local Observed-Score Equating With Anchor-Test Designs , 2010 .

[118]  R. Hambleton,et al.  Fundamentals of Item Response Theory , 1991 .

[119]  David M. Williamson,et al.  EVALUATION OF THE E‐RATER® SCORING ENGINE FOR THE GRE® ISSUE AND ARGUMENT PROMPTS , 2012 .

[120]  Fritz Drasgow The work ahead: A psychometric infrastructure for computerized adaptive tests , 2005 .

[121]  Stability of Rasch Scales Over Time , 2009 .

[122]  Chockalingam Viswesvaran,et al.  Least Squares Models to Correct for Rater Effects in Performance Assessment , 1993 .

[123]  R. Brennan,et al.  Some Practical Issues in Equating , 1987 .

[124]  Sooyeon Kim,et al.  EVALUATING SUBPOPULATION INVARIANCE OF LINKING FUNCTIONS TO DETERMINE THE ANCHOR COMPOSITION FOR A MIXED‐FORMAT TEST , 2009 .

[125]  Neil J. Dorans,et al.  Sources of Score Scale Inconsistency , 2011 .

[126]  Jinghua Liu,et al.  A Scale Drift Study , 2009 .

[127]  Vonda L. Kiplinger,et al.  Raising the Stakes of Test Administration: The Impact on Student Performance on the National Assessment of Educational Progress. , 1995 .

[128]  Neil J. Dorans,et al.  Item Response Theory, Item Calibration, and Proficiency Estimation , 2000 .

[129]  Michael J. Kolen,et al.  Evaluation of Two New Smoothing Methods in Equating: The Cubic B-Spline Presmoothing Method and the Direct Presmoothing Method. , 2009 .

[130]  Eric T. Bradlow,et al.  Item Response Theory Models Applied to Data Allowing Examinee Choice , 1998 .

[131]  Brent Bridgeman,et al.  Comparison of Human and Machine Scoring of Essays: Differences by Gender, Ethnicity, and Country , 2012 .

[132]  Michalis P. Michaelides Sensitivity of Equated Aggregate Scores to the Treatment of Misbehaving Common Items , 2010 .

[133]  Evaluating Equating Accuracy and Assumptions for Groups that Differ in Performance. , 2014 .

[134]  Craig N. Mills,et al.  FIELD TEST OF A COMPUTER-BASED GRE GENERAL TEST , 1993 .

[135]  H. Huynh,et al.  Equivalence of Paper-and-Pencil and Online Administration Modes of the Statewide English Test for Students With and Without Disabilities , 2010 .

[136]  Andrew J. Poggio,et al.  A Comparative Evaluation of Score Results from Computerized and Paper & Pencil Mathematics Testing in a Large Scale State Assessment Program , 2005 .

[137]  Christine E. DeMars Detection of Item Parameter Drift over Multiple Test Administrations , 2004 .

[138]  Daniel R. Eignor,et al.  DERIVING COMPARABLE SCORES FOR COMPUTER ADAPTIVE AND CONVENTIONAL TESTS: AN EXAMPLE USING THE SAT1,2 , 1993 .

[139]  H. Leeson The Mode Effect: A Literature Review of Human and Technological Issues in Computerized Testing , 2006 .

[140]  Neil J. Dorans,et al.  CONSISTENCY OF SAT® I: REASONING TEST SCORE CONVERSIONS , 2008 .

[141]  Deborah J. Harris,et al.  Psychometric Properties of Scale Scores and Performance Levels for Performance Assessments Using Polytomous IRT , 2000 .

[142]  M. J. Kolen Does Matching in Equating Work? A Discussion. , 1990 .

[143]  The Optimal Degree of Smoothing in Equipercentile Equating with Postsmoothing. , 1995 .

[144]  Linda L. Cook,et al.  Irt Versus Conventional Equating Methods: A Comparative Study of Scale Stability , 1983 .

[145]  Samuel A. Livingston,et al.  New Approaches to Equating With Small Samples , 2009 .

[146]  Hongwen Guo,et al.  Accumulative Equating Error after a Chain of Linear Equatings , 2010 .

[147]  Timothy D. Ritchie,et al.  Factors in Paper-and-Pencil and Computer Reading Score Differences at the Primary Grades , 2006 .

[148]  Mary E. Lunz,et al.  The Effect of Review on the Psychometric Characteristics of Computerized Adaptive Tests. , 1994 .

[149]  Equating error in observed-score equating , 2006 .

[150]  Linda L. Cook,et al.  SPECIFYING THE CHARACTERISTICS OF LINKING ITEMS USED FOR ITEM RESPONSE THEORY ITEM CALIBRATION1,2 , 1987 .

[151]  Howard Wainer,et al.  How Well Can We Compare Scores on Test Forms That Are Constructed by Examinees Choice , 1994 .

[152]  Robert L. Linn,et al.  High-Stakes Uses of Performance-Based Assessments , 1995 .

[153]  P. Holland,et al.  A New Approach to Comparing Several Equating Methods in the Context of the NEAT Design , 2010 .

[154]  C. Glas,et al.  Elements of adaptive testing , 2010 .

[155]  ALTERNATIVE LOGLINEAR SMOOTHING MODELS AND THEIR EFFECT ON EQUATING FUNCTION ACCURACY , 2009 .

[156]  Deborah J. Harris,et al.  Effect of Examinee Group on Equating Relationships , 1986 .

[157]  John W. Young,et al.  The Cognitive Equivalence of Reading Comprehension Test Items Via Computerized and Paper-and-Pencil Administration , 2003 .

[158]  Walter M. Houston,et al.  Adjustments for Rater Effects in Performance Assessment , 1991 .

[159]  Sooyeon Kim,et al.  Investigating the Effectiveness of Equating Designs for Constructed‐Response Tests in Large‐Scale Assessments , 2010 .

[160]  Randy Elliot Bennett,et al.  Does it Matter if I take My Writing Test on Computer? An Empirical Study of Mode Effects in NAEP , 2006 .

[161]  Neil J. Dorans,et al.  Implications for Altering the Context in Which Test Items Appear: A Historical Perspective on an Immediate Concern , 1985 .

[162]  Kadriye Ercikan,et al.  Calibration and Scoring of Tests With Multiple-Choice and Constructed-Response Item Types , 1998 .

[163]  Gary W. Phillips,et al.  Technical Issues in Large-Scale Performance Assessment. , 1996 .

[164]  E. Baker,et al.  Impact of Accommodation Strategies on English Language Learners' Test Performance , 2005 .

[165]  Cynthia G. Parshall,et al.  Practical Considerations in Computer-Based Testing , 2002 .

[166]  M. Pomplun A Bifactor Analysis for a Mode-of-Administration Effect , 2007 .

[167]  Brent Bridgeman,et al.  Effects of Screen Size, Screen Resolution, and Display Rate on Computer-Based Test Performance , 2001 .

[168]  Gregory J. Cizek,et al.  The Effect of Altering the Position of Options in a Multiple-Choice Examination , 1994 .

[169]  Gary A. Schaeffer The Introduction and Comparability of the Computer Adaptive GRE General Test. GRE Board Professional Report No. 88-08aP. , 1995 .

[170]  R. Brennan Tests in Transition: Discussion and Synthesis , 2007 .

[171]  Samuel A. Livingston,et al.  Random‐Groups Equating with Samples of 50 to 400 Test Takers , 2010 .

[172]  Y. Attali Sequential Effects in Essay Ratings , 2011 .

[173]  Investigating the Population Sensitivity Assumption of Item Response Theory True-Score Equating Across Two Subgroups of Examinees and Two Test Formats , 2008 .

[174]  Wendy M. Yen,et al.  The Psychometric Characteristics of Choice Items , 1995 .

[175]  P. Holland,et al.  The Effects of Selection Strategies for Bivariate Loglinear Smoothing Models on NEAT Equating Functions. , 2010 .

[176]  Effect on Equating Results of Matching Samples on an Anchor Test. , 1990 .

[177]  Luuk C. Rietveld,et al.  Practical Aspects of Task Allocation in Design and Development of Digital Closed Questions in Higher Education , 2008 .

[178]  James M. Royer,et al.  Testing Accommodations for Examinees With Disabilities: A Review of Psychometric, Legal, and Social Policy Issues , 2001 .

[179]  D. Eignor Linking Scores Derived Under Different Modes of Test Administration , 2007 .

[180]  Kevin C. Larkin,et al.  SUBPOPULATION INVARIANCE OF EQUATING FUNCTIONS , 2006 .

[181]  G. E. Miller,et al.  Expected Equating Error Resulting From Incorrect Handling of Item Parameter Drift Among the Common Items , 2009 .

[182]  Gerald E. DeMauro,et al.  AN INVESTIGATION OF THE APPROPRIATENESS OF THE TOEFL TEST AS A MATCHING VARIABLE TO EQUATE TWE TOPICS , 1992 .

[183]  R. Hambleton,et al.  Evaluating Score Equity Assessment for State NAEP , 2009 .

[184]  P. Congdon,et al.  The Stability of Rater Severity in Large‐Scale Assessment Programs , 2000 .

[185]  The Effects of Test Length and Sample Size on the Reliability and Equating of Tests Composed of Constructed-Response Items , 2001 .

[186]  P. Cheng,et al.  Estimating Comparable Scores Using Surrogate Variables , 2001 .

[187]  Brent Bridgeman,et al.  COMPARABILITY OF PAPER-AND-PENCIL AND COMPUTER ADAPTIVE TEST SCORES ON THE GRE® GENERAL TEST , 1998 .

[188]  H. Wainer,et al.  On Examinee Choice in Educational Testing , 1994 .

[189]  W. R. Cowell,et al.  AN EXAMINATION OF THE ASSUMPTION THAT THE EQUATING OF PARALLEL FORMS IS POPULATION‐INDEPENDENT , 1985 .

[190]  H. Huynh,et al.  A Comparison of Equal Percentile and Partial Credit Equatings for Performance-Based Assessments Composed of Free-Response Items. , 1994 .

[191]  Invariance of Score Linkings Across Gender Groups for Forms of a Testlet-Based College-Level Examination Program Examination , 2008 .

[192]  Jill Burstein,et al.  Automated Essay Scoring : A Cross-disciplinary Perspective , 2003 .

[193]  Michalis P. Michaelides,et al.  An Illustration of a Mantel-Haenszel Procedure to Flag Misbehaving Common Items in Test Equating , 2008 .

[194]  R. C. Sykes,et al.  The Effects of Computer Administration on Scores and Item Parameter Estimates of an IRT-Based Licensure Examination , 1997 .

[195]  Alina A. von Davier,et al.  Practical Application of a Synthetic Linking Function on Small-Sample Equating , 2011 .

[196]  THE EFFECTS ON OBSERVED- AND TRUE-SCORE EQUATING PROCEDURES OF MATCHING ON A FALLIBLE CRITERION: A SIMULATION WITH TEST VARIATION1 , 1990 .

[197]  Robert C. Sykes,et al.  The Scaling of Mixed-Item Format Tests with the One-Parameter and Two-Parameter Partial Credit Models. , 2000 .

[198]  Linda L. Cook,et al.  Problems Related to the Use of Conventional and Item Response Theory Equating Methods in Less Than Optimal Circumstances , 1987 .

[199]  Robert L. Ziomek,et al.  Predicting the College Grade Point Averages of Special-Tested Students from Their ACT Assessment Scores and High School Grades. , 1996 .

[200]  A Graphical Approach to Evaluating Equating Using Test Characteristic Curves , 2011 .

[201]  Tim Davey,et al.  Computer-Adaptive Testing for Students with Disabilities: A Review of the Literature. Research Report. ETS RR-11-32. , 2011 .

[202]  Two Approaches for Using Multiple Anchors in NEAT Equating , 2011 .

[203]  Samuel A. Livingston,et al.  The Circle-Arc Method for Equating in Small Samples , 2009 .

[204]  Bruce Bloxom,et al.  Operational Calibration of the Circular-Response Optical-Mark-Reader Answer Sheets for the Armed Services Vocational Aptitude Battery (ASVAB) , 1993 .

[205]  Deniz S. Ones,et al.  Psychometric equivalence of the computer and booklet forms of the MMPI: A meta-analysis , 1999 .

[206]  Quantifying Equating Errors with Item Response Theory Methods , 1985 .

[207]  H. Wainer,et al.  Are Tests Comprising Both Multiple‐Choice and Free‐Response Items Necessarily Less Unidimensional Than Multiple‐Choice Tests?An Analysis of Two Tests , 1994 .

[208]  Lixiong Gu,et al.  Differential Item Functioning of GRE Mathematics Items across Computerized and Paper-and-Pencil Testing Media. , 2006 .

[209]  Sooyeon Kim,et al.  Evaluating the Comparability of Paper-and-Pencil and Computerized Versions of a Large-Scale Certification Test. Research Report. ETS RR-05-21. , 2005 .

[210]  Manfred Steffen,et al.  The GRE Computer Adaptive Test: Operational Issues , 2000 .

[211]  Peter E. Kennedy,et al.  Combining Multiple-Choice and Constructed-Response Test Scores: An Economist's View , 1997 .

[212]  B. Loyd Mathematics Test Performance: The Effects of Item Type and Calculator Use , 1991 .

[213]  L. Crocker,et al.  Achieving Form-to-Form Comparability: Fundamental issues and Proposed Strategies for Equating Performance Assessments of Teachers , 1995 .

[214]  D. Borsboom Educational Measurement (4th ed.) , 2009 .

[215]  K. Ercikan,et al.  The Consistency Between Raters Scoring in Different Test Years , 1998 .

[216]  Walter D. Way IRT Ability Estimates from Customized Achievement Tests Without Representative Content Sampling , 1989 .

[217]  Martha L. Stocking,et al.  A Method for Severely Constrained Item Selection in Adaptive Testing , 1992 .

[218]  Masune Sukigara,et al.  Equivalence between Computer and Booklet Administrations of the New Japanese Version of the MMPI , 1996 .

[219]  Robert L. Brennan,et al.  Conditional standard errors of measurement for scale scores using binomial and compund binomial assu , 1992 .

[220]  P. Holland,et al.  How to Average Equating Functions, If You Must , 2009 .

[221]  Shudong Wang,et al.  A Meta-Analysis of Testing Mode Effects in Grade K-12 Mathematics Tests , 2007 .

[222]  Daniel O. Segall,et al.  Equating the CAT-ASVAB. , 1997 .

[223]  Willem J. van der Linden Computerized adaptive testing with equated number-correct scoring , 2001 .

[224]  R. Mckinley,et al.  Reducing Test Form Overlap of the GRE Subject Test in Mathematics Using IRT Triple-Part Equating. GRE Board Professional Report No. 86-14P. , 1989 .

[225]  Dorothy T. Thayer,et al.  POPULATION INVARIANCE OF SCORE LINKING: THEORY AND APPLICATIONS TO ADVANCED PLACEMENT PROGRAM® EXAMINATIONS , 2003 .

[226]  N. Dorans,et al.  USING THE SELECTION VARIABLE FOR MATCHING OR EQUATING1,2 , 1993 .

[227]  Linda L. Cook Practical Problems in Equating Test Scores: A Practitioner’s Perspective , 2007 .

[228]  T. Davey,et al.  Potential Impact of Context Effects on the Scoring and Equating of the Multistage GRE® Revised General Test , 2011 .

[229]  Exploring Population Sensitivity of Linking Functions Across Three Law School Admission Test Administrations , 2008 .

[230]  James W Pellegrino,et al.  Technology and Testing , 2009, Science.

[231]  Mary E. Lunz,et al.  Interjudge Reliability and Decision Reproducibility , 1994 .

[232]  Katie Larsen McClarty,et al.  Item-Level Comparative Analysis of Online and Paper Administrations of the Texas Assessment of Knowledge and Skills , 2008 .

[233]  Wim J. van der Linden,et al.  Capitalization on Item Calibration Error in Adaptive Testing , 1998 .

[234]  N. Dorans,et al.  Checking the Statistical Equivalence of Nearly Identical Test Editions , 1990 .

[235]  S. Sinharay Chain Equipercentile Equating and Frequency Estimation Equipercentile Equating: Comparisons Based on Real and Simulated Data , 2011 .

[236]  W. D. Linden Equating Scores from Adaptive to Linear Tests , 2006 .

[237]  Shelby J. Haberman,et al.  Limits on the Accuracy of Linking. Research Report. ETS RR-10-22. , 2010 .

[238]  First Language of Test Takers and Fairness Assessment Procedures , 2011 .

[239]  J. S. Gilmer The Effects of Test Disclosure on Equated Scores and Pass Rates , 1989 .

[240]  Wendy M. Yen,et al.  Scaling Performance Assessments: Strategies for Managing Local Item Dependence , 1993 .

[241]  The Impact of Item Deletion on Equating Conversions and Reported Score Distributions. , 1986 .

[242]  F. Drasgow,et al.  Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. , 1993 .

[243]  D. D. Bickerstaff,et al.  Computerized adaptive testing , 2015 .

[244]  P. Holland,et al.  Population Invariance and the Equatability of Tests: Basic Theory and The Linear Case , 2000 .

[245]  Invariance of Equating Functions Across Different Subgroups of Examinees Taking a Science Achievement Test , 2008 .

[246]  F. Drasgow,et al.  Does computerizing paper-and-pencil job attitude scales make a difference? New IRT analyses offer insight. , 2000, The Journal of applied psychology.

[247]  H. Wainer,et al.  COMBINING MULTIPLE-CHOICE AND CONSTRUCTED RESPONSE TEST SCORES: TOWARD A MARXIST THEORY OF TEST CONSTRUCTION , 1992 .

[248]  Gautam Puhan A Comparison of Chained Linear and Poststratification Linear Equating under Different Testing Conditions. , 2010 .

[249]  T. F. Donlon The College Board technical handbook for the scholastic aptitude test and achievement tests , 1984 .

[250]  Shelby J. Haberman,et al.  Limits on the Accuracy of Linking , 2010 .

[251]  Catherine M. Hombo,et al.  Equating and Linking of Performance Assessments , 2000 .

[252]  Jaeyool Boo,et al.  Computerized and Paper-and-Pencil Versions of the Rosenberg Self-Esteem Scale: A Comparison of Psychometric Features and Respondent Preferences , 2001 .

[253]  Wendy M. Yen,et al.  The Maryland School Performance Assessment Program: Performance Assessment with Psychometric Quality Suitable for High Stakes Usage , 1997 .

[254]  Stephen B. Dunbar,et al.  Quality Control in the Development and Use of Performance Assessments , 1991 .

[255]  Effects of Passage and Item Scrambling on Equating Relationships , 1991 .

[256]  Selection Strategies for Univariate Loglinear Smoothing Models and Their Effect on Equating Function Accuracy , 2009 .

[257]  Rebecca D. Hetter,et al.  Evaluating item calibration medium in computerized adaptive testing. , 1997 .

[258]  P. Holland,et al.  An Approach to Evaluating the Missing Data Assumptions of the Chain and Post-stratification Equating Methods for the NEAT Design , 2008 .

[259]  Hyeonjoo J. Oh,et al.  The Effects of Essay Placement and Prompt Type on Performance on the New SAT , 2006 .

[260]  Mark Wilson,et al.  Complex Composites: Issues That Arise in Combining Different Modes of Assessment , 1995 .

[261]  I. Lawrence,et al.  LINKING SCORES FOR COMPUTER-ADAPTIVE AND PAPER-AND-PENCIL ADMINISTRATIONS OF THE SAT , 1997 .

[262]  Does Linking Mixed-Format Tests Using a Multiple-Choice Anchor Produce Comparable Results for Male and Female Subgroups? , 2011 .

[263]  INVARIANCE OF LINKINGS OF THE REVISED 2005 SAT REASONING TEST™ TO THE SAT® I: REASONING TEST ACROSS GENDER GROUPS , 2005 .

[264]  R. Brennan,et al.  A Comparison of the Frequency Estimation and Chained Equipercentile Methods Under the Common-Item Nonequivalent Groups Design , 2008 .

[265]  HOW UNIDIMENSIONAL ARE TESTS COMPRISING BOTH MULTIPLE-CHOICE AND FREE-RESPONSE ITEMS? AN ANALYSIS OF TWO TESTS1 , 1993 .

[266]  Walter P. Vispoel,et al.  Individual Differences and Test Administration Procedures: A Comparison of Fixed-Item, Computerized-Adaptive, and Self-Adapted Testing. , 1994 .

[267]  Anchor Test Type and Population Invariance: An Exploration Across Subpopulations and Test Administrations , 2008 .

[268]  R. Mislevy Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. , 1992 .

[269]  Robustness to Format Effects of IRT Linking Methods for Mixed-Format Tests , 2006 .

[270]  Rebecca Zwick Effects of Item Order and Context on Estimation of NAEP Reading Proficiency , 1991 .

[271]  Robert J. Mislevy,et al.  How to Equate Tests With Little or No Data , 1993 .

[272]  Samuel A. Livingston,et al.  A Case of Inconsistent Equatings: How the Man With Four Watches Decides What Time It Is , 2009 .

[273]  Kathleen E. Moreno,et al.  The Effects of Mode of Test Administration on Test Performance , 1986 .

[274]  Cynthia G. Parshall,et al.  Computer Testing versus Paper-and-Pencil Testing: An Analysis of Examinee Characteristics Associated with Mode Effect. , 1993 .

[275]  R. Hambleton,et al.  International Perspectives on Academic Assessment , 2012 .

[276]  Martha L. Stocking THREE PRACTICAL ISSUES FOR MODERN ADAPTIVE TESTING ITEM POOLS1 , 1994 .

[277]  Sooyeon Kim,et al.  Comparisons among Designs for Equating Mixed‐Format Tests in Large‐Scale Assessments , 2010 .

[278]  Robert L. Brennan The Context of Context Effects , 1992 .

[279]  P. Holland,et al.  Is It Necessary to Make Anchor Tests Mini-Versions of the Tests Being Equated or Can Some Restrictions Be Relaxed? , 2007 .

[280]  Tim Moses AN EVALUATION OF STATISTICAL STRATEGIES FOR MAKING EQUATING FUNCTION SELECTIONS , 2008 .

[281]  Willem J. van der Linden,et al.  Linear Models for Optimal Test Design , 2005 .

[282]  Test Score Equating Using a Mini‐Version Anchor and a Midi Anchor: A Case Study Using SAT® Data , 2011 .

[283]  M. J. Kolen,et al.  Conditional Standard Errors of Measurement for Scale Scores Using IRT , 1996 .

[284]  Anthony R. Zara,et al.  Procedures for Selecting Items for Computerized Adaptive Tests. , 1989 .

[285]  George Engelhard,et al.  The Measurement of Writing Ability With a Many-Faceted Rasch Model , 1992 .

[286]  N. Dorans Equating Methods and Sampling Designs , 1990 .

[287]  William A. Sands,et al.  Computerized adaptive testing: From inquiry to operation. , 1997 .

[288]  Richard L. Tate,et al.  Performance of a Proposed Method for the Linking of Mixed Format Tests With Constructed Response and Multiple Choice Items , 2000 .