Video Epitomes

Recently, “epitomes” were introduced as patch-based probability models that are learned by compiling together a large number of examples of patches from input images. In this paper, we describe how epitomes can be used to model video data and we describe significant computational speedups that can be incorporated into the epitome inference and learning algorithm. In the case of videos, epitomes are estimated so as to model most of the small space-time cubes from the input data. Then, the epitome can be used for various modeling and reconstruction tasks, of which we show results for video super-resolution, video interpolation, and object removal. Besides computational efficiency, an interesting advantage of the epitome as a representation is that it can be reliably estimated even from videos with large amounts of missing data. We illustrate this ability on the task of reconstructing the dropped frames in video broadcast using only the degraded video and also in denoising a severely corrupted video.

[1]  Michael J. Black,et al.  Mixture models for optical flow computation , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[2]  William T. Freeman,et al.  Example-Based Super-Resolution , 2002, IEEE Computer Graphics and Applications.

[3]  Eli Shechtman,et al.  Space-time video completion , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Brendan J. Frey,et al.  Epitomic analysis of appearance and shape , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Patrick Pérez,et al.  Object removal by exemplar-based inpainting , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[7]  Brendan J. Frey,et al.  A comparison of algorithms for inference and learning in probabilistic graphical models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Andrew Blake,et al.  Super-resolution Enhancement of Video , 2003, AISTATS.

[9]  Alexei A. Efros,et al.  Image quilting for texture synthesis and transfer , 2001, SIGGRAPH.

[10]  Song-Chun Zhu,et al.  What are Textons? , 2005, International Journal of Computer Vision.

[11]  John Wang,et al.  Applying mid-level vision techniques for video data compression and manipulation , 1994, Electronic Imaging.

[12]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Brendan J. Frey,et al.  Unsupervised image translation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.