Variational layered dynamic textures

The layered dynamic texture (LDT) is a generative model, which represents video as a collection of stochastic layers of different appearance and dynamics. Each layer is modeled as a temporal texture sampled from a different linear dynamical system, with regions of the video assigned to a layer using a Markov random field. Model parameters are learned from training video using the EM algorithm. However, exact inference for the E-step is intractable. In this paper, we propose a variational approximation for the LDT that enables efficient learning of the model. We also propose a temporally-switching LDT (TS-LDT), which allows the layer shape to change over time, along with the associated EM algorithm and variational approximation. The ability of the LDT to segment video into layers of coherent appearance and dynamics is also extensively evaluated, on both synthetic and natural video. These experiments show that the model possesses an ability to group regions of globally homogeneous, but locally heterogeneous, stochastic dynamics currently unparalleled in the literature.

[1]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[2]  René Vidal,et al.  Optical flow estimation & segmentation of multiple moving dynamic textures , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[5]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[7]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[8]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[9]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[10]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[11]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[12]  Gang Hua,et al.  Switching observation models for contour tracking in clutter , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Nuno Vasconcelos,et al.  Derivations for the Layered Dynamic Texture and Temporally-Switching Layered Dynamic Texture , 2009 .

[14]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[15]  Daniel Cremers,et al.  Dynamic texture segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[17]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[18]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[19]  Nuno Vasconcelos,et al.  Layered Dynamic Textures , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  René Vidal,et al.  Segmenting Dynamic Textures with Ising Descriptors, ARX Models and Level Sets , 2006, WDV.

[22]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.