Sequential Learning of Layered Models from Video

A popular framework for the interpretation of image sequences is the layers or sprite model, see e.g. [15], [6] . Jojic and Frey [8] provide a generative probabilistic model framework for this task, but their algorithm is slow as it needs to search over discretized transformations (e.g. translations, or affines) for each layer simultaneously. Exact computation with this model scales exponentially with the number of objects, so Jojic and Frey used an approximate variational algorithm to speed up inference. Williams and Titsias [16] proposed an alternative sequential algorithm for the extraction of objects one at a time using a robust statistical method, thus avoiding the combinatorial explosion.

[1]  Christopher K. I. Williams,et al.  Fast Unsupervised Greedy Learning of Multiple Objects and Parts from Video , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Andrew W. Fitzgibbon,et al.  On Affine Invariant Clustering and Automatic Cast Listing in Movies , 2002, ECCV.

[3]  Michalis Titsias,et al.  Unsupervised learning of multiple objects in images , 2005 .

[4]  Andrew Zisserman,et al.  Learning Layered Pictorial Structures from Video , 2004, ICVGIP.

[5]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  Christopher K. I. Williams,et al.  Fast Learning of Sprites using Invariant Features , 2005, BMVC.

[7]  Serge J. Belongie,et al.  What went where , 2003, CVPR 2003.

[8]  Christopher K. I. Williams,et al.  Greedy Learning of Multiple Objects in Images Using Robust Statistics and Factorial Learning , 2004, Neural Computation.

[9]  P. Torr Geometric motion segmentation and model selection , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[10]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[11]  Michal Irani,et al.  Computing occluding and transparent motions , 1994, International Journal of Computer Vision.

[12]  David J. Fleet,et al.  A Layered Motion Representation with Occlusion and Compact Spatial Support , 2002, ECCV.

[13]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[14]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[15]  Hai Tao,et al.  Dynamic layer representation with applications to tracking , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[16]  Harpreet S. Sawhney,et al.  Compact Representations of Videos Through Dominant and Multiple Motion Estimation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Alex Pentland,et al.  Cooperative Robust Estimation Using Layers of Support , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..