论文信息 - Generative Model for Layers of Appearance and Deformation

Generative Model for Layers of Appearance and Deformation

We are interested in learning generative models of objects that can be used in wide range of tasks such as video summarization, image segmentation and frame interpolation. Learning object-based appearance/shape models and estimating motion fields (deformation field) are highly interdependent problems. At the extreme, all motions can be represented as an excessively large set of appearance exemplars. However, a more efficient representation of a video sequence would save on frame description if it described the motion from the previous frame instead. The extreme in this direction is also problematic as there are usually causes of appearance variability other than motion. The flexible sprite model (Jojic and Frey 2001) illustrates the benefits of joint modelling of motion, shape and appearance using very simple models. The advantage of such a model is that each part of the model tries to capture some of the variability in the data until all the variability is decomposed and explained through either appearance, shape or transformation changes. Yet, the set of motions modelled is very limited, and the residual motion is simply captured in the variance maps of the sprites. In this paper, we develop a better balance between the transformation and appearance model by explicitly modelling arbitrary large, non-uniform motion.

[1] Brendan J. Frey,et al. Separating appearance from deformation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[2] Harpreet S. Sawhney,et al. Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding , 1995, Proceedings of IEEE International Conference on Computer Vision.

[3] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[4] Brendan J. Frey,et al. Epitomic analysis of appearance and shape , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5] Edward H. Adelson,et al. A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[7] Michael J. Black,et al. Mixture models for optical flow computation , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9] Brendan J. Frey,et al. Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10] Brendan J. Frey,et al. Advances in Algorithms for Inference and Learning in Complex Probability Models , 2003 .

[11] Brendan J. Frey,et al. Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[12] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[13] Michael J. Black,et al. Estimating Optical Flow in Segmented Images Using Variable-Order Parametric Models With Local Deformations , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[14] David J. Fleet,et al. Probabilistic detection and tracking of motion discontinuities , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15] Brendan J. Frey,et al. Learning appearance and transparency manifolds of occluded objects in layers , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[16] Christopher K. I. Williams,et al. Learning About Multiple Objects in Images: Factorial Learning without Factorial Search , 2002, NIPS.