Learning Layered Pictorial Structures from Video

We propose a new unsupervised learning method to obtain a layered pictorial structure (LPS) representation of an articulated object from video sequences. It will be seen that this is related in turn to methods for learning sprite based representations of an image. The method we describe involves a new generative model for performing segmentation on a set of images. Included in this model are the effects of motion blur and occlusion. An initial estimate of the parameters of the model is obtained by dividing the scene into rigidly moving components. The estimate of the matte of each part is refined using a variation of the α-expansion graph cut algorithm. This method has the advantage of achieving a strong local minimum over labels. Results are demonstrated on animals for which an articulated LPS representation is naturally suited.

[1]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Richard Szeliski,et al.  An integrated Bayesian approach to layer extraction from image sequences , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[6]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Vladimir Kolmogorov,et al.  Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[8]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[9]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  David A. Forsyth,et al.  Using temporal coherence to build models of animals , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Christopher K. I. Williams,et al.  Greedy Learning of Multiple Objects in Images Using Robust Statistics and Factorial Learning , 2004, Neural Computation.