CVIU special issue on event detection in video

It is our pleasure to welcome you to this special issue of Computer Vision and Image Understanding on event detection in video. The initial call for papers was sent in September 2001, and we are glad that we are finally able to bring this issue to you. The papers in this issue were derived from their original submissions at the first IEEE Event Detection in Video Workshop held in Vancouver, British Columbia, as part of ICCV 2001. Since then, three other Event Workshops have been held, and this topic is now beginning to be part of the mainstream computer vision conferences. The fundamental issues surrounding the detection, recognition, and understanding of events continue to be an active topic of research by several academic and industrial researchers across the world. The analysis of events is important in a variety of applications including surveillance, vision-based human–computer interaction, and content-based retrieval. Several challenges exist with regard to the detection and recognition of events. First, a good definition of what constitutes an event itself is lacking. Both salient changes and the states surrounding such changes are often termed as events in time-varying data. The time scale for an event can vary over a large range. For example, a man running in an otherwise static scene can constitute an event. The Gulf War is also an event that lasted over a much longer time period. Because of the long duration of this event, it could be regarded as a state during its occurrence. Second, understanding events seems to involve the detection and recognition of objects, actions, and their evolving interrelationships. Moreover, events are often multimodal, requiring the gathering of evidence from information available in multiple media sources such as video and audio. Even with the best techniques for visual or audio scene analysis, event detection using individual cues will continue to exhibit poor robustness in the foreseeable future, as a result of high detection errors. Further, the localization of events through multimodal fusion will continue to be difficult due to conflicting indications given by the individual cues. The purpose of this special issue was to highlight the state-of-the-art research in this emerging field. We solicited original papers that addressed a range of issues in event detection and recognition in digital video including: