Variational layered dynamic textures

The layered dynamic texture (LDT) is a generative model, which represents video as a collection of stochastic layers of different appearance and dynamics. Each layer is modeled as a temporal texture sampled from a different linear dynamical system, with regions of the video assigned to a layer using a Markov random field. Model parameters are learned from training video using the EM algorithm. However, exact inference for the E-step is intractable. In this paper, we propose a variational approximation for the LDT that enables efficient learning of the model. We also propose a temporally-switching LDT (TS-LDT), which allows the layer shape to change over time, along with the associated EM algorithm and variational approximation. The ability of the LDT to segment video into layers of coherent appearance and dynamics is also extensively evaluated, on both synthetic and natural video. These experiments show that the model possesses an ability to group regions of globally homogeneous, but locally heterogeneous, stochastic dynamics currently unparalleled in the literature.

[1]  René Vidal,et al.  Segmenting Dynamic Textures with Ising Descriptors, ARX Models and Level Sets , 2006, WDV.

[2]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[3]  René Vidal,et al.  Optical flow estimation & segmentation of multiple moving dynamic textures , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[5]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[6]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[7]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[8]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[9]  Nuno Vasconcelos,et al.  Layered Dynamic Textures , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[12]  Gang Hua,et al.  Switching observation models for contour tracking in clutter , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Nuno Vasconcelos,et al.  Derivations for the Layered Dynamic Texture and Temporally-Switching Layered Dynamic Texture , 2009 .

[14]  Berthold K. P. Horn Robot vision , 1986, MIT electrical engineering and computer science series.

[15]  Daniel Cremers,et al.  Dynamic texture segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[17]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[19]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[21]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[22]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.