A neural network approach to key frame extraction

We present a neural network based approach to key frame extraction in the compressed domain. The proposed method is an amalgamation of both the MPEG-7 descriptors namely motion intensity descriptor and spatial activity descriptor. Shot boundary detection and block motion estimation techniques are employed prior to the extraction of the descriptors. The motion intensity (“pace of action”) is obtained using a fuzzy system that classifies the motion intensity into five categories proportional to the intensity. The spatial activity matrix determines the spatial distribution of activity (“active regions”) in a frame. A neural network is used to pick those frames as key frames which have high intensity and maximum spatial activity at the center of the frame. Results are compared against two well-known key frame extraction techniques to demonstrate the advantage and robustness of the proposed approach. Results show that the neural network approach performs much better than selecting first frame of the shot as a key frame and selecting middle frame of the shot as a key frame methods.

[1]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[2]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[3]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[4]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[5]  B. S. Manjunath,et al.  A Motion Activity Descriptor and Its Extraction in Compressed Domain , 2001, IEEE Pacific Rim Conference on Multimedia.

[6]  Ajay Divakaran,et al.  MPEG-7 visual motion descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[7]  Regunathan Radhakrishnan,et al.  Video summarization using descriptors of motion activity: A motion activity based approach to key-frame extraction from video shots , 2001, J. Electronic Imaging.

[8]  Behzad Shahraray,et al.  Automatic generation of pictorial transcripts of video programs , 1995, Electronic Imaging.

[9]  Ajay Divakaran,et al.  Descriptor for spatial distribution of motion activity for compressed video , 1999, Electronic Imaging.

[10]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Mohan S. Kankanhalli,et al.  Content-based representative frame extraction for digital video , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).