Evaluating audio skimming and frame rate acceleration for summarizing BBC rushes

For the first time in 2007, TRECVID considered structured evaluation of automated video summarization, utilizing BBC rushes video. In 2007, we conducted user evaluations with the published TRECVID summary assessment procedure to rate a cluster method for producing summaries, a 25x (sampling every 25th frame), and pz (emphasizing pans and zooms). Data from 4 human assessors shows significant differences between the cluster, pz, and 25x approaches. The best coverage (text inclusion performance) is obtained by 25x, but at the expense of 25x taking the most time to evaluate and judged as being the most redundant. Method pz was easier to use than cluster and rated best on redundancy. A question following the TRECVID workshop was whether simple speed-ups would still work at 50x or 100x, leading to a study with 15 human assessors looking at pzA (pz but with better audio), 25x, 50x, and 100x summaries (these latter 3 with an unsynchronized more comprehensive audio track as well). 100x gives the fastest time on task but with poor usability and performance. PzA gives the best usability measures but poor time on task and performance. 25x does well on performance as before, with 50x doing just as well but with much less time on task and better ease of use and redundancy scores. Based on these results, 50x with its audio skimming is recommended as the best way to summarize video rushes materials.