Bayesian video matting using learnt image priors

Video matting, or layer extraction, is a classic inverse problem in computer vision that involves the extraction of foreground objects, and the alpha mattes that describe their opacity, from a set of images. Modem approaches that work with natural backgrounds often require user-labelled "trimaps" that segment each image into foreground, background and unknown regions. For long sequences, the production of accurate trimaps can be time consuming. In contrast, another class of approach depends on automatic background extraction to automate the process, but existing techniques do not make use of spatiotemporal consistency, and cannot take account of operator hints such as trimaps. This paper presents a method inspired by natural image statistics that cleanly unifies these approaches. A prior is learnt that models the relationship between the spatiotemporal gradients in the image sequence and those in the alpha mattes. This is used in combination with a learnt foreground colour model and a prior on the alpha distribution to help regularize the solution and greatly improve the automatic performance of such systems. The system is applied to several real image sequences that demonstrate the advantage that the unified approach can afford.

[1]  Eero P. Simoncelli,et al.  On Advances in Statistical Modeling of Natural Images , 2004, Journal of Mathematical Imaging and Vision.

[2]  David Salesin,et al.  Video matting of complex scenes , 2002, SIGGRAPH.

[3]  Richard Szeliski,et al.  Layer extraction from multiple images containing reflections and transparency , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Carlo Tomasi,et al.  Alpha estimation in natural images , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[5]  Andrew W. Fitzgibbon,et al.  Bayesian Estimation of Layers from Multiple Images , 2002, ECCV.

[6]  Michal Irani,et al.  Computing occluding and transparent motions , 1994, International Journal of Computer Vision.

[7]  Hanno Scharr,et al.  Image statistics and anisotropic diffusion , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  James F. Blinn,et al.  Blue screen matting , 1996, SIGGRAPH.

[9]  John M. Hannah,et al.  Alpha channel estimation in high resolution images and image sequences , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Harpreet S. Sawhney,et al.  Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.

[12]  David Mumford,et al.  Statistics of natural images and models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[13]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[14]  John Cocke Global common subexpression elimination , 1970 .

[15]  Michael J. Black,et al.  A framework for the robust estimation of optical flow , 1993, 1993 (4th) International Conference on Computer Vision.

[16]  David Salesin,et al.  A Bayesian approach to digital matting , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.