Spatiotemporal semantic video segmentation

In this paper, we propose a framework to extend semantic labeling of images to video shot sequences and achieve efficient and semantic-aware spatiotemporal video segmentation. This task faces two major challenges, namely the temporal variations within a video sequence which affect image segmentation and labeling, and the computational cost of region labeling. Guided by these limitations, we design a method where spatiotemporal segmentation and object labeling are coupled to achieve semantic annotation of video shots. An internal graph structure that describes both visual and semantic properties of image and video regions is adopted. The process of spatiotemporal semantic segmentation is subdivided in two stages: Firstly, the video shot is split into small block of frames. Spatiotemporal regions (volumes) are extracted and labeled individually within each block. Then, we iteratively merge consecutive blocks by a matching procedure which considers both semantic and visual properties. Results on real video sequences show the potential of our approach.

[1]  David S. Doermann,et al.  Video retrieval using spatio-temporal descriptors , 2003, MULTIMEDIA '03.

[2]  Alberto Del Bimbo,et al.  Efficient Matching and Indexing of Graph Models in Content-Based Retrieval , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Yiannis Kompatsiaris,et al.  Using a Multimedia Ontology Infrastructure for Semantic Annotation of Multimedia Content , 2005, SemAnnot@ISWC.

[4]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[5]  Harry Shum,et al.  Video object cut and paste , 2005, ACM Trans. Graph..

[6]  Yannis Avrithis,et al.  Semantic Image Segmentation and Object Labeling , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Benoit Huet,et al.  Graph-Based Spatio-temporal Region Extraction , 2006, ICIAR.

[8]  Song Wang,et al.  Image-Segmentation Evaluation From the Perspective of Salient Object Extraction , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Hayit Greenspan,et al.  Probabilistic space-time video modeling via piecewise GMM , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.