An extended framework for adaptive playback-based video summarization

In our previous work, we described an adaptive fast playback framework for video summarization where we changed the playback rate using the motion activity feature so as to maintain a constant “pace.” This method provides an effective way of skimming through video, especially when the motion is not too complex and the background is mostly still, such as in surveillance video. In this paper, we present an extended summarization framework that, in addition to motion activity, uses semantic cues such as face or skin color appearance, speech and music detection, or other domain dependent semantically significant events to control the playback rate. The semantic features we use are computationally inexpensive and can be computed in compressed domain, yet are robust, reliable, and have a wide range of applicability across different content types. The presented framework also allows for adaptive summaries based on preference, for example, to include more dramatic vs. action elements, or vice versa. The user can switch at any time between the skimming and the normal playback modes. The continuity of the video is preserved, and complete omission of segments that may be important to the user is avoided by using adaptive fast playback instead of skipping over long segments. The rule-set and the input parameters can be further modified to fit a certain domain or application. Our framework can be used by itself, or as a subsequent presentation stage for a summary produced by any other summarization technique that relies on generating a sub-set of the content.

[1]  Regunathan Radhakrishnan,et al.  Motion activity-based extraction of key-frames from video shots , 2002, Proceedings. International Conference on Image Processing.

[2]  Ajay Divakaran,et al.  Constant pace skimming and temporal sub-sampling of video using motion activity , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[3]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[4]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Andrew B. Watson,et al.  Window of visibility: a psychophysical theory of fidelity in time-sampled visual motion displays , 1986 .

[6]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[7]  Svetha Venkatesh,et al.  Finding the Beat: An Analysis of the Rhythmic Elements of Motion Pictures , 2002, Int. J. Image Graph..

[8]  Regunathan Radhakrishnan,et al.  Generation of sports highlights using motion activity in combination with a common audio feature extraction framework , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[9]  Shih-Fu Chang,et al.  Clustering methods for video browsing and annotation , 1996, Electronic Imaging.

[10]  Sanjeev R. Kulkarni,et al.  Rapid estimation of camera motion from compressed video with application to video annotation , 2000, IEEE Trans. Circuits Syst. Video Technol..

[11]  A. Murat Tekalp,et al.  Efficient Filtering and Clustering Methods for Temporal Video Segmentation and Visual Summarization , 1998, J. Vis. Commun. Image Represent..

[12]  Shih-Fu Chang,et al.  Condensing computable scenes using visual complexity and film syntax analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..