Stochastic rigidity: image registration for nowhere-static scenes

We consider the registration of sequences of images where the observed scene is entirely non-rigid; for example a camera flying over watel; a panning shot of afield of sunflbwers in the wind, or footage of a crowd applauding at a sports event. In these cases, it is not possible to impose the constraint that world points have similar colour in successive views, so existing registration techniques [ I , 5, 9, 11 J cannot be applied. Indeed the relationship between a point’s colours in successive frames is essentially a random process. However; by treating the sequence of images as a set of samples from a multidimensional stochastic time-series, we can learn a stochastic model (e.g. an A R model [16, 231) of the random process which generated the sequence of images. With a static camera, this stochastic model can be used to extend the sequence arbitrarily in time: driving the model with random noise results in an infinitely varying sequence of images which always looks like the short input sequence. In this way, we can create “videotextures” [21, 241 which can play forever without repetition. With a moving camera, the image generation process comprises two components+ stochastic component generated by the videotexture, and a parametric component due to the camera motion. For example, a camera rotation induces a relationship between successive images which is modelled by a 4-point perspective transformation, or homography. Human observers can easily separate the camera motion from the stochastic element. The key observation for an automatic implementation is that without image registration, the time-series analysis must work harder to model the combined stochastic and parametric image generation. Specifically, the learned model will require more components, or more coeficients, to achieve the same expressive power as for the static scene. With the correct registration the model will be more compact. Therefore, by searching for the registration parameters which result in the most parsimonious stochastic model, we can register sequences where there is only stochastic rigidity. The paper describes an implementation of this scheme and shows results on a number of example sequences. 0-7695-1143-0/01 $10.00

[1]  P. Gill,et al.  Algorithms for the Solution of the Nonlinear Least-Squares Problem , 1978 .

[2]  F. Glazer,et al.  Scene Matching by Hierarchical Correlation , 1983 .

[3]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[4]  Michael J. Black,et al.  A framework for the robust estimation of optical flow , 1993, 1993 (4th) International Conference on Computer Vision.

[5]  S. P. Mudur,et al.  Three-dimensional computer vision: a geometric viewpoint , 1993 .

[6]  Michal Irani,et al.  Representation of scenes from collections of images , 1995, Proceedings IEEE Workshop on Representation of Visual Scenes (In Conjunction with ICCV'95).

[7]  Anil C. Kokaram,et al.  A System for Reconstruction of Missing Data in Image Sequences Using Sampled 3D AR Models and MRF Motion Priors , 1996, ECCV.

[8]  Rosalind W. Picard A Society of Models for Video and Image Libraries , 1996, IBM Syst. J..

[9]  Martin Szummer,et al.  Temporal texture modeling , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[10]  Philip H. S. Torr An assessment of information criteria for motion model selection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[13]  Andrew Zisserman,et al.  Automated mosaicing with super-resolution zoom , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[14]  Matthew Brand,et al.  Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15]  Andrew Blake,et al.  Classification of human body motion , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[17]  Roberto Cipolla,et al.  A Statistical Consistency Check for the Space Carving Algorithm , 2000, BMVC.

[18]  David S. Stoffer,et al.  Time series analysis and its applications , 2000 .

[19]  Richard Szeliski,et al.  Video textures , 2000, SIGGRAPH.

[20]  Brendan J. Frey,et al.  Transformed hidden Markov models: estimating mixture models of images and inferring spatial transformations in video sequences , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[21]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[22]  Arnold Neumaier,et al.  Algorithm: Arfit | a Matlab Package for Estimation and Spectral Decomposition of Multivariate Autoregressive Models , 2007 .