Video Segmentation Based on Spatial-Temporal Attention Model

Focusing on segmentation error existed in video segmentation algorithms under the complicated and dynamic background, spatial-temporal feature is proposed and can be extracted through significant mapping. Video segmentation is modeled using hierarchical conditional random field. In this algorithm, temporal relative motion characteristics and spatial color characteristics are used to construct the significant mapping. In accordance with the visual psychology theory, the moving objects and static background are separated roughly. Then Gaussian mixture model is used to establish the energy functions of foreground and background. The super-pixel is used to define the adjacent energy function, which binds relevance among the adjacent context. Finally, the hierarchical conditional random field model is used to solve these features energy functions under constraints in order to gain the final segmentation results. The experiments show that the algorithm will be effect and stable even under complex and dynamical background. KeywordsVideo segmentation, spatial-temporal feature, significant mapping, hierarchical conditional random field

[1]  Joachim M. Buhmann,et al.  Topology Free Hidden Markov Models: Application to Background Modeling , 2001, ICCV.

[2]  B. Schiele,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[3]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[4]  Bernt Schiele,et al.  Towards Robust Pedestrian Detection in Crowded Image Sequences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.