Learning flexible sprites in video layers

We propose a technique for automatically learning layers of "flexible sprites" (probabilistic 2-dimensional appearance maps and masks of moving, occluding objects). The model explains each input image as a layered composition of flexible sprites. A variational expectation maximization algorithm is used to learn a mixture of sprites from a video sequence. For each input image, probabilistic inference is used to infer the sprite class, translation, mask values and pixel intensities (including obstructed pixels) in each layer. Exact inference is intractable, but we show how a variational inference technique can be used to process 320/spl times/240 images at 1 frame/second. The only inputs to the learning algorithm are the video sequence, the number of layers and the number of flexible sprites. We give results on several tasks, including summarizing a video sequence with sprites, point-and-click video stabilization, and point-and-click object removal.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Michael J. Black,et al.  Mixture models for optical flow computation , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[4]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[5]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[6]  David J. Fleet,et al.  Probabilistic detection and tracking of motion discontinuities , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  Richard Szeliski,et al.  An integrated Bayesian approach to layer extraction from image sequences , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Hai Tao,et al.  Dynamic layer representation with applications to tracking , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[9]  Richard Szeliski,et al.  An Integrated Bayesian Approach to Layer Extraction from Image Sequences , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Brendan J. Frey,et al.  Transformation-Invariant Clustering and Dimensionality Reduction Using EM , 2001 .