Establishing Inter-rater Agreement for TIDEE's Teamwork and Professional Development Assessments

Senior capstone design courses in engineering programs provide an opportunity to address important curricular objectives related to teamwork and professional development. In this course, students work within a team environment and are challenged with non-technical issues, such as communication, organization, self-directed learning, etc. By the end of their capstone experience it is hoped that students are prepared for the professional working environment. Capstone faculty, often with technical expertise in a specific branch of engineering, have expressed difficulty in teaching and assessing the types of knowledge, skills, and affective behaviors associated with these non-technical performance areas. When assessing teamwork, for example, the approach of “I know it when I see it” is not uncommon for an assessment process. Valid and reliable assessment instruments are needed for capstones which define expected performance criteria, and therefore offer guidance for teaching and learning. In addition to this formative use, summative assessments are also needed to document achievement of student growth with regards to these outcomes. To this end, collaborators from the Transferable Integrated Design Engineering Education consortium (TIDEE) have developed a suite of assessments for use in capstone courses, comprising four common performance areas: teamwork, professional development, design processes, and solution assets. For each of these areas of performance, multiple assessments have been developed and testing for validity and reliability has been ongoing. The purpose of this paper is to present results from a reliability study conducted with seven TIDEE assessments from the teamwork and professional development performance areas. For each of the assessments tested, the degree of inter-rater reliability was determined, representing an estimate of the consistency of scoring between multiple raters. This type of reliability is significant for the TIDEE assessments as essay-type responses are elicited from students and, therefore, requires professional judgments by faculty to assess achievement. Each assessment was tested by having two faculty raters and two teaching assistant raters score a subset of student work with corresponding scoring rubrics. Percent agreement calculations and correlations were used to interpret the level of rater agreement. Interpretations of the results were made in light of the intended uses of each assessment: formative and/or summative. In general, the assessments were found to have scoring agreement of 85% to 100% within a one-point variation. Exact agreement ranged from a high of 60% to a low of 20%. Overall, the results indicated sufficient agreement for use with formative assessment (for enhancing teaching and learning). For summative use, five of the assessments should prove adequate in documenting student growth, including the Team Contract, Team Member Citizenship, Growth Planning, Growth Progress, and Professional Development assessments. The remaining two, Team Processes and Growth Achieved, may need to be revised to improve agreement. Suggestions for improvement include revisions to rubric descriptors for each level of performance, improved Frame-of-Reference rater training to decrease rater errors and increase accuracy, and, lastly, incorporation of Behavior-Observation-Training in the training protocol. P ge 22639.2