Using Technology-Enhanced Items to Measure Fourth Grade Geometry Knowledge

Technology-enhanced items have the potential to provide improved measures of student knowledge compared to traditional item types. This paper uses quantitative analysis of fourthgrade geometry field test data to explore a) the validity of inferences made from certain kinds of technology-enhanced items and b) whether those item types provide improved measurement compared to traditional item types. There was strong evidence based on internal structure and based on the relationship with other variables that technology-enhanced items provided valid inferences. The evidence of whether that measurement was an improvement over the measurement provided by traditional selected-response items was mixed. Using Technology-Enhanced Items to Measure Fourth Grade Geometry Knowledge 2/46 High-quality assessment is critical to the educational process. Teachers and educational stakeholders can only make effective decisions about instruction and student progress if they have access to assessments that result in reliable and valid inferences about student knowledge. Technology-enhanced (TE) items have the potential to provide improved measures of student knowledge over traditional item types because they require students to produce, rather than select, a response. TE items can create a more engaging environment for students and can reduce the effects of guessing and test-taking skills. Constructed-response (CR) items can often provide similar benefits, but TE items have the additional benefit of supporting automated scoring, a critical feature in the modern classroom that requires fast-paced, inexpensive, and accurate assessment feedback. For these and other reasons, national assessment consortia, state departments of education, and assessment developers have widely incorporated TE items into their formative and summative assessments. While these potential benefits of TE items have spurred the forward momentum of TE item development and use, there has been a lack of rigorous research on the validity of inferences made from TE items and the ability of TE items to provide improved measurement over traditional selected-response (SR) items (Bryant, 2017). Researchers have stressed the need for evidence that TE items are more than merely engaging, but also provide an accurate, and potentially improved, measure of student knowledge. The current paper contributes to the small but growing base of research related to whether the inferences made from TE items are valid and whether TE items provide improved measurement over SR items. Field test data are examined to address the following research questions: RQ1) To what extent do TE items provide a valid measure of geometry standards in the elementary grades? RQ2) To what extent do TE items provide improved measurement compared to SR items? To address these questions, the Validity of Technology-Enhanced Assessment in Geometry (VTAG) project collected from classroom administration of TE, SR, and CR items targeting fourth grade Common Core State Standards in geometry (shown in Figure 1). CCSS.MATH.CONTENT.4.G.A.1: Draw points, lines, line segments, rays, angles (right, acute, obtuse), and perpendicular and parallel lines. Identify these in two-dimensional figures. CCSS.MATH.CONTENT.4.G.A.2: Classify two-dimensional figures based on the presence or absence of parallel or perpendicular lines, or the presence or absence of angles of a specified size. Recognize right triangles as a category, and identify right triangles. CCSS.MATH.CONTENT.4.G.A.3: Recognize a line of symmetry for a two-dimensional figure as a line across the figure such that the figure can be folded along the line into matching parts. Identify linesymmetric figures and draw lines of symmetry. Figure 1: Common Core State Standards in Fourth Grade Geometry Using Technology-Enhanced Items to Measure Fourth Grade Geometry Knowledge 3/46 1. Theoretical Framework Assessment is a critical component within the instructional process and instruction should be differentiated based on the results of assessments (Pellegrino, Chudowsky, & Glaser, 2001). The 2010 National Education Technology (NET) Plan’s goal related to assessment is that “Our education system at all levels will leverage the power of technology to measure what matters and use assessment data for continuous improvement (USDE, p. xvii).” Research has documented the inadequacies of SR items to measure many types of high-level knowledge and understanding (Archbald & Newmann, 1988; Bennett, 1993; Birenbaum & Tatsuoka, 1987; Hickson & Reed, 2009; Lane, 2004; Livingston, 2009; Darling-Hammond & Lieberman, 1992; Quellmalz, Timms, & Schneider, 2009). One approach to overcome the shortcomings of SR items is the use of textentry or CR items, which have been used to measure high-order skills and knowledge. In recent years, researchers have leveraged technological advancements to combine the measurement power of CR items with the automated-scoring capability of SR items. One branch of this research has focused on automated text and essay scoring (e.g., Dikli, 2006), while another branch has focused on using technology to allow students to interact with digital content in innovative ways, through the development of TE items. This second line of research is consistent with the NET Plan’s assessment-related recommendations, which include the development of assessments that provide “new and better ways” to assess students and the expansion of the capacity to design, develop, and validate technology-enhanced assessments that can access constructs difficult to measure with traditional assessments (USDE, 2010). For this recommendation to be realized, more research is needed on the validity of technology-enhanced assessments in a variety of contexts. TE items offer many potential benefits over SR items. The most significant is that TE items have the potential to provide improved measurement of certain constructs, specifically high-level or cognitively complex constructs, because, rather than simply select information, they require students to produce information, which is often a more authentic form of measurement (Archbald & Newmann, 1988; Bennett, 1999; Harlen & Crick, 2003; Huff & Sireci, 2001; Jodoin, 2003; McFarlane, Williams, & Bonnett, 2000; Sireci & Zenisky, 2006; Zenisky & Sireci, 2002). A second benefit is that TE items reduce the effects of test-taking skills and random guessing (Huff & Sireci, 2001). A third benefit is that TE items have the potential to provide richer diagnostic information by recording not only the student’s final response, but also the interaction and response processes that can reveal the student’s thought process that lead to that response (Birenbaum & Tatsuoka, 1987). CR items have always offered the first of these two benefits, but TE items allow these benefits to be leveraged on items administered via computer that can be automatically and instantly scored. A fourth potential benefit of TE items is a possible reduction of cognitive load from non-relevant constructs, such as the reading load for non-reading related items or the cognitive load required to keep various item constructs in memory (Mayer & Moreno, 2003; Thomas, 2016). Finally, TE items tend to be more engaging to students, an important consideration in an era when students frequently feel over-tested (StrainUsing Technology-Enhanced Items to Measure Fourth Grade Geometry Knowledge 4/46 Seymour, Way, & Dolan, 2009; Dolan, Goodman, Strain-Seymour, Adams, & Sethuraman, 2011). The potential of TE items to provide improved measurement was initially underscored by the awarding of Race to the Top Assessment funds to PARCC and Smarter Balanced, who proposed to develop next-generation assessment systems that would incorporate TE items in their summative and non-summative assessments (PARCC, 2010; SBAC, 2010). Since that time, state departments of education have continued to pursue the promise of TE items. Statewide summative assessment Requests for Proposals frequently include specific provisions for the development and administration of TE items, stipulating the availability of specific interaction types (e.g., State of Maine, Department of Education, 2014) and the presumption of measurement of higher-order thinking skills (e.g., Oklahoma State Department of Education, 2017). Despite the forward momentum to develop and use TE items, there is only a small research base evaluating the validity of TE items in various contexts within K-12 education. Pearson conducted cognitive labs with elementary, middle, and high school students to evaluate perceptions of TE items, the cognitive processes used to respond to TE items, and the potential for TE items to better evaluate constructs in both mathematics and English language arts (Dolan, Goodman, Strain-Seymour, Adams, & Sethuraman, 2011). Although the results cannot be broadly generalized because of the small sample sizes, the research found preliminary evidence to suggest that TE items were highly usable and engaging. More importantly, the research found that TE items produced measurements of constructs, particularly high-level constructs, that were not easily measured with traditional items types. The study found that the use of TE items reduced guessing and allowed students to have more authentic interactions with content. The study also found that TE items required more time to complete and that this factor was influenced by students’ technical proficiencies (ibid.). In a separate research effort, Pearson partnered with the Minnesota State Department of Education to evaluate and compare the performance of TE items, SR items, and CR items in the context of fifth grade, eighth grade, and high school science (Wan & Henley, 2012). This study explored TE items of the figural response type, which includes “hotspot” identification, dragand-drop, and reordering. Through item response theory analyses, this study found that TE items provided the same amount of information