Automated Video Assessment of Human Performance

Performance assessment is receiving consideration as an alternative to traditional standardized testing in many educational settings. In any performance assessment, there exists a tradeoff between the flexibility of the evaluation rubric and the reliability of the resulting scores. When the evaluation rubric for a set of performances captured on video can be completely specified, the most reliable method of scoring the performances is by Automated Video Assessment, i.e. using computers to analyze the video data of a performance recording. This paper addresses three important issues concerning the application of Automated Video Assessment: the appropriate performance types, the necessary computer technology, and the effect of automation on performance assessment concerns. The application of Automated Video Assessment is demonstrated by a computer system that analyzes video recordings of gymnasts performing the vault, and partially evaluates their performances according to the rubrics used in gymnastic competition. 1. Automated Video Assessment The use of performance assessments for student evaluation, placement, and monitoring system-wide outcomes has recently been explored as a serious alternative to traditional standardized tests in many educational settings. In many instances, evaluating a student's abilities by observing their performance of a task is preferable to indirect evaluation methods, such as multiple choice examinations. In cases where curricular goals are centered primarily around the acquisition of skills, performance assessments provide educators with an authentic means of evaluating the strengths and weaknesses of their students. The use of advanced media, specifically video, has been offered as the most appropriate means of capturing individual performances so that they can be evaluated by distant and independent scorers (Collins, Hawkins, & Frederiksen, 1993). In large-scale video assessment programs, the task of reliably scoring huge numbers of performances may be insurmountable. One solution to this problem is using computers to automatically analyze and score performances captured on video, a process which shall be referred to in this paper as Automated Video Assessment. When assessing any set of performances there exists a tradeoff between the flexibility of the evaluation rubric and the reliability of the resulting scores. Reliable scoring of performances is often critical to the educational development of students and important to ensure fairness in high-stakes evaluations. High rubric flexibility, while necessary for many types of performance assessments, leads to variations in scores due to rater biases, order of scoring effects, and differences between raters' experience and training. Several researchers have demonstrated that it is possible to produce reliable scores on performance or product assessments when the raters are well trained (Herman, Gearhart, & Baker, 1993; Moss, 1994; Shavelson, Baxter, & Pine, 1992). However, obtaining adequate reliability in large-scale performance or product assessment programs has been difficult (Koretz, 1992; Madaus & Kellaghan, 1993). Attempts at improving the reliability of large-scale performance or product assessment programs have focused largely on greater rubric specification (Huot, 1990). Automated Video Assessment explores one extreme of the flexibility / reliability tradeoff. When evaluation rubrics for performances captured on video can be completely specified, the most reliable scoring method is by computer analysis of the video performance. Although current technologies in the area of computer video analysis are inadequate for most performance tasks, research in this area has presented the opportunity to explore Automated Video Assessment in limited, small-scale applications, as anticipated by previous researchers (Kitchen, 1990). The purpose of this paper is to address three important questions concerning the application of Automated Video Assessment. First, what types of performances are appropriate for Automated Video Assessment? Second, what technology is necessary to build a computer system capable of scoring performances on video? Third, how does automation affect the issues surrounding the scoring of performances? After addressing these questions, an example of the application of Automated Video Assessment is provided. A system that analyzes and partially evaluates video performances of gymnasts executing the vault is described. 2. What performances are appropriate for Automated Video Assessment? The most important concern when considering the application of Automated Video Assessment is determining whether a particular performance in an assessment situation is appropriate for computer analysis. For each potential application, two critical questions must be answered. First, is it possible to develop a scoring rubric for the performance type that specifies exactly how each performance should scored based solely on the information captured on video? Practitioners and educators in many fields are unwilling and often unable to specify exactly what constitutes good and poor performances, especially for artistic performances such as dance and music, as well as for complex, unconstrained performances such as teaching and social interaction. Even when raters are in complete agreement about the scores for a particular performance type, explicating a rubric based on video information may be an incredible challenge, especially for those types which require expert raters to notice subtle features of student performances. Experts' abilities to judge the quality of performances are often based on an intuitive sense, acquired through years of evaluation and execution, which they may be unable to explicate into formal rules. Importantly, the evaluation rubric must be specified in terms of features that can be identified from the information that video recordings provide. Automated Video Assessment is only appropriate for performances where all of the features relevant to the scoring are directly observable or explicitly derivable from the source video. The second critical question to ask is whether high reliability is essential for the particular performance assessment situation. Reliability in performance assessment is important in two types of situations. First, high reliability is important when the development of the student over repeated assessments is a primary concern. In these cases, reliable scoring provides the student with valuable feedback and indicators of achievement. Second, reliability is important when different student performances are ranked in a high-stakes assessment situation. High reliability helps to ensure that students are judged fairly, and that the ranking of performances accurately reflects students' abilities. High-stakes assessments are rarely productive in educational situations, as the ordering of students based on abilities is often not a primary concern. However there are a number of situations where high-stakes assessments are appropriate, including athletic competition and workplace evaluation. What types of performances meet these constraints? Good candidates for Automated Video Assessment are those performances that have spatial and temporal execution, i.e. those that are exclusively action-oriented. The best candidates will be those performance types which consist of a constant set of physical actions which are to be executed in a very specific manner. Examples of these types of performances include parts of athletic execution in individual and team sports such as gymnastics and football, the operation of machinery and devices such as factory equipment, and the execution of specific procedures such as those found in medicine and laboratory research. Each of these types of performances could benefit from Automated Video Assessment both for training and ranking purposes. 3. What technology is necessary for Automated Video Assessment? Automated Video Assessment can be viewed as a process that takes a video recording of a performance as input and produces a score or set of scores based on an analysis of the video as output. The union of computers and video is fast becoming commonplace. It is now possible to capture large amounts of video data for computer storage and playback. The technology needed to fully analyze the content of the video data is lacking, however. In Automated Video Assessment, this technology will consist of algorithms that quantify the quality of a performance by extracting meaningful information from the video data. The field of computer vision has been researching the possibility of analyzing the content of digital images for nearly three decades with limited success, e.g. compare (Roberts, 1965) to (Lowe, 1985). To many people outside the field of computer vision, the current state-of-the-art often seems ridiculously primitive. It is still a difficult task to build a vision system which can recognize a familiar object in a scene, or even to construct a useful description of the elements of an image. Early vision researchers, who were looking for simple and functional theories of visual understanding, quickly realized that successful vision systems would necessarily be extraordinarily complex. In spite of these obstacles, research in the field of computer vision has steadily progressed. Today, vision researchers have a wide range of general and special purpose algorithms available to extract meaningful information from video data. These algorithms and their successors can serve as the basis for quantifying the quality of a performance in Automated Video Assessment. The most promising vision algorithms for Automated Video Assessment are those that have been developed for tracking moving objects. Tracking algorithms provide information about the positions and motions of objects in the image, often by identifying and predicting an object's location in each frame. When robust tracking algo

[1]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[2]  David G. Lowe,et al.  Perceptual Organization and Visual Recognition , 2012 .

[3]  R. Shavelson,et al.  Research news and Comment: Performance Assessments , 1992 .

[4]  Daniel P. Huttenlocher,et al.  Tracking non-rigid objects in complex scenes , 1993, 1993 (4th) International Conference on Computer Vision.

[5]  B. Huot,et al.  The Literature of Direct Writing Assessment: Major Concerns and Prevailing Trends , 1990 .

[6]  Russell L. Anderson,et al.  A Robot Ping-Pong Player: Experiments in Real-Time Intelligent Control , 1988 .

[7]  Ian D. Reid,et al.  Tracking foveated corner clusters using affine structure , 1993, 1993 (4th) International Conference on Computer Vision.

[8]  Daniel Koretz,et al.  The Vermont Portfolio Assessment Program: Interim Report on Implementation and Impact, 1991-92 School Year. Project 3.2: Collaborative Development of Statewide Systems. Report of Year 1 Vermont Study. , 1992 .

[9]  Timothy D. Lee,et al.  Prior processing effects on gymnastic judging. , 1991 .

[10]  R. Shavelson Performance Assessments: Political Rhetoric and Measurement Reality , 1992 .

[11]  Eva L. Baker,et al.  Assessing Writing Portfolios: issues in the Validity and Meaning of Scores , 1993 .

[12]  G. Madaus,et al.  The British Experience with "Authentic" Testing. , 1993 .

[13]  Allan Collins,et al.  Three Different Views of Students: The Role of Technology in Assessing Student Performance , 1991 .

[14]  Pamela A. Moss,et al.  Can There Be Validity Without Reliability? , 1994 .