Automatic 3D Video Summarization: Key Frame Extraction from Self-Similarity

In this paper we present an automatic key frame selection method to summarise 3D video sequences. Key-frame selection is based on optimisation for the set of frames which give the best representation of the sequence according to a rate-distortion trade-off. Distortion of the summarization from the original sequence is based on measurement of self-similarity using volume histograms. The method evaluates the globally optimal set of key-frames to represent the entire sequence without requiring pre-segmentation of the sequence into shots or temporal correspondence. Results demonstrate that for 3D video sequences of people wearing a variety of clothing the summarization automatically selects a set of key-frames which represent the dynamics. Comparative evaluation of rate-distortion characteristics with previous 3D video summarization demonstrates improved performance.

[1]  Stefanos D. Kollias,et al.  Video content representation using optimal extraction of frames and scenes , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[2]  Changick Kim,et al.  Video Object Extraction for Object-Oriented Applications , 2001, J. VLSI Signal Process..

[3]  Feng Liu,et al.  3D motion retrieval with motion index tree , 2003, Comput. Vis. Image Underst..

[4]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[5]  Wei Sun,et al.  Whole-body modelling of people from multiview images to populate virtual worlds , 2000, The Visual Computer.

[6]  Larry S. Davis,et al.  Gait Recognition Using Image Self-Similarity , 2004, EURASIP J. Adv. Signal Process..

[7]  Daniel Thalmann,et al.  Key-posture extraction out of human motion data , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[9]  Urs Ramer,et al.  An iterative procedure for the polygonal approximation of plane curves , 1972, Comput. Graph. Image Process..

[10]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[11]  Aggelos K. Katsaggelos,et al.  Rate-distortion optimal video summary generation , 2005, IEEE Transactions on Image Processing.

[12]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[13]  Adrian Hilton,et al.  A Study of Shape Similarity for Temporal Surface Sequences of People , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[14]  Daniel Cohen-Or,et al.  Action synopsis: pose selection and illustration , 2005, ACM Trans. Graph..

[15]  Stefan Carlsson,et al.  Pose-based clustering in action sequences , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[16]  Markus H. Gross,et al.  Scalable 3D video of dynamic scenes , 2005, The Visual Computer.

[17]  Kiyoharu Aizawa,et al.  Summarization of 3D Video by Rate-Distortion Trade-off , 2007, IEICE Trans. Inf. Syst..

[18]  Yueting Zhuang,et al.  An Efficient Keyframe Extraction from Motion Capture Data , 2006, Computer Graphics International.

[19]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[20]  Aaron F. Bobick,et al.  Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[21]  Jonathan Foote,et al.  Discriminative techniques for keyframe selection , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[22]  M. Ibrahim Sezan,et al.  Hierarchical video summarization , 1998, Electronic Imaging.

[23]  Patrick Pérez,et al.  Rapid Summarisation and Browsing of Video Sequences , 2002, BMVC.

[24]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[25]  Stephen W. Smoliar,et al.  Video parsing, retrieval and browsing: an integrated and content-based solution , 1997, MULTIMEDIA '95.

[26]  Marco Ceccarelli,et al.  Visual search in a SMASH system , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[27]  Yukinobu Taniguchi,et al.  An intuitive and efficient access interface to real-time incoming video based on automatic indexing , 1995, MULTIMEDIA '95.