Abstract Techniques for automatic video parsing and annotation are crucial to turn enormous volumes of video data into a rich and structured data type, and to facilitate video content-based search and retrieval. In this paper, we present a generic video parser with a Scene Description Language (SDL). The SDL enables the human operator to model a video clip in terms of a relatively high abstract level. The video parser is equipped with various algorithms that are common and essential to general video analyses. To handle the video domain with virtually unlimited sets of unanticipated and variable objects and events efficiently, an object-orientated, processing-on-demand approach is devised to perform the video parsing. The video parser first interprets the video model defined by the operator identifies the prominent video properties to be parsed, and then creates an entity for each of the video properties. Each entity knows how to find a match for itself from the video properties extracted from the video image. The video parser interacts with these entities, and performs the feature extraction operations with processing-on-demand basis. Each entity has a self-diagnostic function that is able to turn itself into an inert state when it fails to find the necessary matches during the video parsing process. The inert entities will be excluded from subsequent operations, and will no longer consume any system resources. Our experiments have shown that our generic video parser is effective and efficient in handling a large variety of video images.
[1]
Yihong Gong,et al.
Detection of Regions Matching Specified Chromatic Features
,
1995,
Comput. Vis. Image Underst..
[2]
HongJiang Zhang,et al.
Automatic parsing of TV soccer programs
,
1995,
Proceedings of the International Conference on Multimedia Computing and Systems.
[3]
Takafumi Miyatake,et al.
IMPACT: an interactive natural-motion-picture dedicated multimedia authoring system
,
1991,
CHI.
[4]
Akio Nagasaka,et al.
Automatic Video Indexing and Full-Video Search for Object Appearances
,
1991,
VDB.
[5]
Stephen S. Intille.
Tracking using a local closed-world assumption : tracking in the football domain
,
1994
.
[6]
Ramesh C. Jain,et al.
Knowledge-guided parsing in video databases
,
1993,
Electronic Imaging.
[7]
Yihong Gong,et al.
Automatic parsing of news video
,
1994,
1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.