Guidelines for Evaluating Criterion-Referenced Tests and Test Manuals.
暂无分享,去创建一个
Most of the major test publishers have published in the last few years a wide assortment of criterion-referenced tests. In addition, many school districts, state agencies, small testing firms, and consulting firms have produced their own criterion-referenced tests. Criterion-referenced tests are designed to address many problem areas. For example, criterion-referenced tests are being used to monitor student progress through school programs, to diagnose learning disabilities, to report student progress to parents, to evaluate various types of programs, and to certify or license professionals in many fields. Unfortunately, it appears to us, and to many users of criterion-referenced tests we have spoken with, that many of the available tests fall short of the technical quality necessary for them to accomplish their intended purposes. Perhaps one explanation is that many criterion-referenced tests were developed before an adequate testing technology was fully explicated. Fortunately, there now exists an adequate technology for constructing criterion-referenced tests and using criterion-referenced test scores (Hambleton & Eignor, 1978; Hambleton, Swaminathan, Algina, & Coulson, 1978; Popham, 1978). Another possible explanation is that there has been a shortage of guidelines for constructing and using criterion-referenced tests. Certainly the well-known Test Standards for evaluating tests and test manuals prepared by a joint committee of AERA/APA/NCME is helpful, but it is not completely applicable to criterionreferenced tests. Besides the incompleteness of the AERA/APA/NCME Test Standards for evaluating criterion-referenced tests and test manuals, what relevant information there is, is scattered through 75 pages or so of other materials more appropriate for norm-referenced test evaluations. Therefore, the Test Standards in its present form is not very useful for individuals interested in evaluating criterion-referenced tests. The primary purpose of this paper is to propose a set of guidelines for evaluating criterion-referenced tests and test manuals. The guidelines should be useful to both users and developers of criterion-referenced tests. Test standards are not offered in the paper (an example of a standard is "test score reliability must exceed .80"), but we do offer a set of questions for consideration by potential users and developers of criterionreferenced tests. The only other efforts we are aware of to develop guidelines for evaluating criterion-referenced tests and test manuals are Popham (1978, Chapter 8), Swezey and Pearlstein (1975), and Walker (1977). A secondary purpose is to report on our use of the guidelines with eleven commercially available criterion-referenced test batteries. One caution and one comment seem appropriate to introduce at this point. The guidelines represent our own biases about what is important technical information for users to have in making informed decisions about the quality of criterion-referenced tests. Also, because of space limitations, we cannot provide (1) a rationale for the inclusion of each guideline; (2) specifics on how the guidelines were applied; and (3) a copy of our evaluation form. Interested readers are encouraged to read Eignor (1978) and Hambleton and Eignor (1978) for this information.
[1] W. James Popham,et al. Criterion-Referenced Measurement , 1971 .
[2] Ronald K. Hambleton,et al. Criterion-Referenced Testing and Measurement: A Review of Technical Issues and Developments , 1978 .
[3] Ronald K. Hambleton,et al. A Practitioner's Guide to Criterion-Referenced Test Development, Validation, and Test Score Usage (Second Edition). Laboratory of Psychometric and Evaluation Research Report No. 70. , 1979 .