High-zoom video hallucination by exploiting spatio-temporal regularities

In this paper, we consider the problem of super-resolving a human face video by a very high (/spl times/ 16) zoom factor. Inspired by the literature on hallucination and example-based learning, we formulate this task using a graphical model that encodes, (1) spatio-temporal consistencies, and (2) image formation & degradation processes. A video database of facial expressions is used to learn a domain-specific prior for high-resolution videos. The problem is posed as one of probabilistic inference, in which we aim to find the high-resolution video that satisfies the constraints expressed through the graphical model. Traditional approaches to this problem using video data first estimate the relative motion between frames and then compensate for it, and effectively resulting in multiple measurements of the scene. Our use of time is rather direct, we define data structures that span multiple consecutive frames enriching our feature vectors with a temporal signature. We then exploit these signatures to find consistent solutions over time. In our experiments, an 8/spl times/6 pixel-wide face video, subject to translational jitter and additive noise, gets magnified to a 128/spl times/96 pixel video. Our results show that by exploiting both space and time, drastic improvements can be achieved in both video flicker artifacts and mean-squared-error.

[1]  Michael Elad,et al.  Super-Resolution Reconstruction of Image Sequences , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Takeo Kanade,et al.  Limits on Super-Resolution and How to Break Them , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Russell C. Hardie,et al.  Joint MAP registration and high-resolution image estimation using a sequence of undersampled images , 1997, IEEE Trans. Image Process..

[4]  B. Chalmond Modeling and inverse problems in image analysis , 2003 .

[5]  Robert L. Stevenson,et al.  Extraction of high-resolution frames from video sequences , 1996, IEEE Trans. Image Process..

[6]  Andrew Blake,et al.  Super-resolution Enhancement of Video , 2003, AISTATS.

[7]  Andrew Zisserman,et al.  Super-resolution from multiple views using learnt image models , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  S. Chaudhuri Super-Resolution Imaging , 2001 .

[9]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  C. Vogel Computational Methods for Inverse Problems , 1987 .

[11]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[12]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[13]  José Carlos Príncipe,et al.  Super-resolution of images based on local correlations , 1999, IEEE Trans. Neural Networks.

[14]  A. Murat Tekalp,et al.  Superresolution video reconstruction with arbitrary sampling lattices and nonzero aperture time , 1997, IEEE Trans. Image Process..

[15]  Paul A. Viola,et al.  A Non-Parametric Multi-Scale Statistical Model for Natural Images , 1997, NIPS.

[16]  Harry Shum,et al.  A two-step approach to hallucinating faces: global parametric model and local nonparametric model , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[18]  B. R. Hunt,et al.  Digital Image Restoration , 1977 .