Vision for Time-Varying Images

Abstract : Visual processing has predominantly been aimed at labeled, static images (e.g., caltech101), ignoring a) moving images, which constitute a vast amount of visual data (e.g., youtube, television, as well as all natural visual real-world experience); and b) unlabeled images; despite the fact that labeling is among the most time-intensive aspect of vision research. We studied 1) the development of tasks for visual processing of moving scenes, to provide the field with datasets and benchmarks, to begin to try to catch up to the very large number of static visual datasets; and 2) development and testing of algorithms for vision for time-varying images (VTV), including evaluation of existing algorithms and development of novel approaches. This grant was intended to be a relatively brief (18 month) initial proof-of-principle effort. It has arguably exceeded its initial aims: we have developed novel algorithms for object recognition and localization in both still images and in videos, and we have carried out initial evaluations comparing the new methods with previous approaches. The results, described herein, are promising, and ongoing work is aimed at extending the initial findings to include a suite of advanced approaches to VTV tasks.