论文信息 - Visual Surveillance and Video Annotation and Description

Visual Surveillance and Video Annotation and Description

The effectiveness of CCTV surveillance networks is in part determined by their ability to perceive possible threats. Our traditional means for determining a level of threat has been to manually observe a situation through the network and take action as appropriate. The increasing scale of such surveillance networks has however made such an approach untenable, leading us look for a means by which processes may be automated. Here we investigate the language used by security experts in an attempt to look for patterns in the way in which they describe events as observed through a CCTV camera. It is suggested that natural language based descriptions of events may provide the basis for an index which may prove an important component for future automated surveillance systems.

[1] Kunio Fukunaga,et al. Generating natural language description of human behavior from video images , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[2] Marcel Worring,et al. Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .