Reliability of essay rating

Standardized educational tests have until recently been associated almost exclusively with multiple-choice items. In such a test, the examinee is presented items comprising a reading passage stating the question or describing the problem to be solved, followed by a set of response options. One of the options is the correct answer; the others are incorrect. The examinee’s task is to identify the correct response. With such an item format, examinees can be given a large number of items in a relatively short time, say, 40 items in a half-hour test. Scoring the test, that is, recording the correctness of each response, can be done reliably by machines at a moderate cost. A serious criticism of this item format is the limited variety of items that can be administered and that certain aspects of skills and abilities cannot be tested by such items. Many items can be solved more effectively by eliminating all the incorrect response options than by deriving the correct response directly. Certainly, problems that can be formulated as multiple-choice items are much rarer in real life; in many respects, it would be preferable to use items with realistic problems that require the examinees to construct their responses.