Method for extracting camera operations in order to describe subscenes in video sequences

Numerous attempts have been made to detect scene changes in video sequences and to treat scenes as units, to allow handling of very large amounts of video data. But less attention has been given to the structure of scenes themselves. In this study, we define a `sub-scene' as a structural component of a scene in terms of camera operations. In other words, sub-scenes are subsets of scenes, and consist of successive frames in which the movements of the contents are almost identical. The advantage of sub-dividing scenes into sub-scenes is that it gives added power to description of objects and motion information. In order to describe sub-scenes, the estimation of camera operations need not be very accurate, but it must be resistant to noise. The technique proposed in this paper for extracting camera operations, which processes more than two frames at a time by using a 2D spatio-temporal image, meets the above requirement. It is also faster than conventional frame-by-frame analysis. The results of experiments indicate that the technique is both feasible and useful.