Efficient methods of content characterization for the browsing, retrieval or filtering of vast amount of digital video content has become a necessity. Still, there is a gap between the computationally available measures of content characteristics and the semantic interpretations of these characteristics. We want to establish connections between motion activity characteristics of video segments and the semantic characterization of them. For this purpose, two simple descriptors for motion activity of video content is used to infer high-level semantic features of video in certain contexts. One of these descriptors, monotonous activity, is defined as the average block-based motion vector magnitude. The second descriptor, nonmonotonous activity, is an approximation to the average temporal derivative of motion vectors. Simulation results for browsing and retrieval applications show that by using the two measures together, object motions that occur close to the camera can be distinguished from distant ones. Also by using the two descriptors together, we are able to differentiate between high activity due to camera motion and high activity due to dancing people. Hence, these simple descriptors, especially when used to complete each other, promise to provide important clues about semantics of a video.
[1]
Boon-Lock Yeo,et al.
A unified approach to temporal segmentation of motion JPEG and MPEG compressed video
,
1995,
Proceedings of the International Conference on Multimedia Computing and Systems.
[2]
Nuno Vasconcelos,et al.
Towards semantically meaningful feature spaces for the characterization of video content
,
1997,
Proceedings of International Conference on Image Processing.
[3]
A. Murat Tekalp,et al.
Temporal video segmentation using unsupervised clustering and semantic object tracking
,
1998,
J. Electronic Imaging.
[4]
Dragutin Petkovic,et al.
Content-Based Representation and Retrieval of Visual Media: A State-of-the-Art Review
,
1996
.