Topology dictionary with Markov model for 3D video content-based skimming and description

This paper presents a novel approach to skim and describe 3D videos. 3D video is an imaging technology which consists in a stream of 3D models in motion captured by a synchronized set of video cameras. Each frame is composed of one or several 3D models, and therefore the acquisition of long sequences at video rate requires massive storage devices. In order to reduce the storage cost while keeping relevant information, we propose to encode 3D video sequences using a topology-based shape descriptor dictionary. This dictionary is either generated from a set of extracted patterns or learned from training input sequences with semantic annotations. It relies on an unsupervised 3D shape-based clustering of the dataset by Reeb graphs, and features a Markov network to characterize topological changes. The approach allows content-based compression and skimming with accurate recovery of sequences and can handle complex topological changes. Redundancies are detected and skipped based on a probabilistic discrimination process. Semantic description of video sequences is then automatically performed. In addition, forthcoming frame encoding is achieved using a multiresolution matching scheme and allows action recognition in 3D. Our experiments were performed on complex 3D video sequences. We demonstrate the robustness and accuracy of the 3D video skimming with dramatic low bitrate coding and high compression ratio.

[1]  Takeo Kanade,et al.  A stereo machine for video-rate dense depth mapping and its new applications , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Yael Pritch,et al.  Making a Long Video Short: Dynamic Video Synopsis , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Xiaojun Wu,et al.  Real-time 3D shape reconstruction, dynamic 3D mesh deformation, and high fidelity visualization for 3D video , 2004, Comput. Vis. Image Underst..

[5]  Chong-Wah Ngo,et al.  Automatic video summarization by graph modeling , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Craig Gotsman,et al.  Compression of soft-body animation sequences , 2004, Comput. Graph..

[9]  Bruno Raffin,et al.  A Distributed Approach for Real Time 3D Modeling , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[10]  T. Matsuyama,et al.  SKIN-OFF: REPRESENTATION AND COMPRESSION SCHEME FOR 3D VIDEO , 2004 .

[11]  Stefan Carlsson,et al.  Recognizing and Tracking Human Action , 2002, ECCV.

[12]  Takeo Kanade,et al.  Shape-From-Silhouette Across Time Part II: Applications to Human Modeling and Markerless Motion Tracking , 2005, International Journal of Computer Vision.

[13]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[14]  Tony Tung,et al.  The Augmented Multiresolution Reeb Graph Approach for Content-based Retrieval of 3d Shapes , 2005, Int. J. Shape Model..

[15]  Adrian Hilton,et al.  Model-based multiple view reconstruction of people , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Taku Komura,et al.  Topology matching for fully automatic similarity estimation of 3D shapes , 2001, SIGGRAPH.

[18]  Boon-Lock Yeo,et al.  Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[19]  Takashi Matsuyama,et al.  Topology matching for 3D video compression , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Valerio Pascucci,et al.  Robust on-line computation of Reeb graphs: simplicity and speed , 2007, ACM Trans. Graph..

[22]  Jérémie Allard,et al.  Grimage: markerless 3D interactions , 2007, SIGGRAPH '07.

[23]  Marc Alexa,et al.  Representing Animations by Principal Components , 2000, Comput. Graph. Forum.

[24]  Stefano Soatto,et al.  Localizing Objects with Smart Dictionaries , 2008, ECCV.

[25]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.