In this paper we investigate extending the gradient-based inverse compositional image alignment method described by Baker and Matthews [1] by formulating the alignment process probabilistically using Bayes rule to obtain a posterior estimate of the alignment. The probabilistic formulation makes use of prior statistical information on the aligning function parameters, and we investigate the use of arbitrary parameterisation of this function to match the physical system generating the warp. We compare the probabilistic method to the standard inverse compositional algorithm by using affine alignment to track locally planar image regions through image sequences in real-time. We show that the stabilising effect of probabilistic alignment gives more reliable results with little effect on alignment speed. For this application we also find that the choice of warp parameterisation is significant in its own right, as we get much better results from the standard inverse compositional algorithm with a more physically motivated affine parameterisation. 1. BACKGROUND TO IMAGE ALIGNMENT Image alignment as described here is the process of bringing into alignment two image regions by finding the parameters of a known function relating position in one region to position in the other, where the parameters are initially only approximately known. Alignment is useful for any application where pixels in a local area are expected to move in way which can be modelled by a function of position; for example, for tracking rigid objects of known surface shape moving in 3D, tracking the images from a rotating camera, or tracking non-rigid objects which deform in a way which can realistically be modelled. The image alignment problem has been studied for a number of years. Early work using a gradient method was done by Lucas and Kanade [2], and many authors have since looked at modifications to this [3][4][5][1][6]. There are also a range of methods which perform image alignment in very different ways for different applications, for example, optic flow [7], look up techniques [8], or feature matching [9] 1.1. Inverse compositional image alignment The work described here is based on the inverse compositional image alignment algorithm of Baker and Matthews [1], which is a more efficient formulation of the Lucas and Kanade method. Using Baker and Matthews’ notation, the method aims to align an image with a template image , as illustrated in Figure 1. , and represent the intensity in images and at image position . The alignment function maps an image position to a new position , and is a function of the image position and a set of warp parameters . The method aims to minimise the following cost function with respect to : ! "$# (1) where corresponds to the current estimate of the set of warp parameters needed to bring the two images into alignment, and variation of is used to obtain the most optimal alignment from the image data. The summation is done on a set of pixels in the region of interest. This might be all the pixels in an area, or the subset of these for which the image intensity gradient is significant. The warp estimate is then updated as follows: % & ('*)(+ ,/.10 )32 4 576$8 9 :7 (2) that is the new warp is the total effect of applying the old warp, followed by the inverse of the warp defined by . The minimisation is done by performing a first order Taylor expansion on Equation 1 at the point to give: ; =< ?> @ ACB ED D F G 3H " # (3) )) p ; x ∆ ( W ( T (x) I ∆p ∆p Alignment minimises the difference between these two images by adjusting ( T x) ; x ( W ( p)) I p Fig. 1. Inverse Compositional image alignment. The warp is the current estimate of the warp required for alignment. The warp is a small adjustment warp calculated in the alignment algorithm. Because this adjustment is calculated from gradient properties of , which are constant for all images , the algorithm is computationally efficient. where the Jacobian is evaluated at , > @ and the current value of . It is assumed that the parameter vector > @ maps points to themselves. Differentiating and setting the result to zero gives (see [1] for more details): , 9%: =< B ED D H G % & & ! !# (4) where , =< B ED D H < B ED D H 1.2. The need for probabilistic image alignment The inverse compositional algorithm, and other-non probabilistic image alignment algorithms often become unstable when the number of parameters in the warp rises above about three or four. A number of authors note instability when using an affine warp [3][5]. We hope to improve on the stability of the method by solving the alignment problem probabilistically, taking into account uncertainty in the image difference measurements and prior information about the warp parameters. The standard inverse compositional algorithm uses a prior estimate of the warp (defined by ) as the starting point for alignment. We aim to also make use of a measure of the uncertainty in this prior estimate. This tells us which components should change in preference to others during alignment and gives us a measure of the overall change which is reasonable in any alignment step. Since prior information will often come from a model of a physical system, with its own set of parameters, we wish to use the same parameters to describe the warp used in alignment. This will ensure that the warping function is correct, and that uncertainty information can be passed directly between the alignment process and the physical model. In general this will result in an arbitrarily complex function to describe the warp, and the consequence of this is discussed in Section 2. We formulate image alignment as the calculation of the posterior distribution of calculated through Bayes rule. This allows prior information and sources of uncertainty in the inverse compositional measurement ( G & ) to be taken into account. As well as estimating the warp, this method also gives a confidence measure in it, which makes the result more useful to other processes. 2. PARAMETERISATION OF THE IMAGE WARP The transform relating views of an object can often be described by a simple and compact equation. For example, a 2D homography describes the transform relating views of a planar surface moving in 3D, or views from a rotating camera, or many common non-rigid 2D transformations such as shearing, stretching, or scaling. For a particular application we assume that a possibly over-general, but algebraically simple expression like this is known. This expression, 6 , is a function of image position , and a set of parameters 6 : ,6 & # 6 Now suppose we would actually prefer to describe the transformation using a different parameterisation, , which better reflects the physical system being studied: ,% & # For example we might be tracking a planar surface on a mechanism moving in a restricted way in 3D, or tracking images from a rotating camera. The transformation in these cases is a homography, but a more specialised parameterisation whose parameters also have physical meaning is possible. In general the warp can be quite complex, which could make the inverse compositional image alignment method computationally expensive if it is used directly. If, however, we can find the simplest algebraic expression which is general enough to describe the system ( 6 ), and write its parameters as a function of the parameters of , the effect on the inverse compositional algorithm should be minimal. If the function relating the two parameter vectors is: 6 , # where is the parameter vector of and 6 is the parameter vector of 6 . The warp function then becomes: , & # , 6 # 6 $, 6 & # # and the Jacobian of the warp transform becomes: D D # *, D 6 D 6 # 6 D 6 D & , D 6 D 6 # *D D & The inverse compositional algorithm needs to evaluate these values for every pixel being used to calculate the alignment but the calculation has been broken down into the evaluation of & and , which are potentially slow but need only be done once per iteration, and the evaluation of 6 # & and # & , which are relatively simple operations, at every pixel. Therefore the warping function can be reparameterised with negligible cost to the inverse compositional algorithm. One possible complication is that the inverse compositional image alignment algorithm linearises the warp function around the zero warp position, so parameterisations which are excessively nonlinear in this region might be expected to work less well. We tested the usefulness of parameterisation on its own by using an inverse compositional tracker to align the images coming from a rotating camera, and comparing the result to measurements coming from a gyroscopic sensor. For this application the parameterisation constrains the general 2D homography ( 6 ), with eight degrees of freedom, to the conjugate rotation corresponding to our calibrated, fixed focal length camera, which has only three degrees of freedom. The resulting rotation tracker was able to very reliably track the motion of a rotating camera in real-time [10]. 3. PROBABILISTIC INVERSE COMPOSITIONAL IMAGE ALIGNMENT The inverse compositional algorithm calculates a least squares estimate of a state from a series of measurements (one for each pixel) which are a function of the state: , G & B ED D (5) We wish to calculate a better estimate of by specifying uncertainty in the measurement equation, and prior information on , and including them in the calculation. This estimate is the posterior distribution of , given by Bayes rule: / *, / , 9 / (6) 3.1. A likelihood function for A more accurate version of Equation 5 includes terms representing uncertainty in the measurement: , B D D A : ACB " ACB D D *# (7) where the terms : , " , and are zero mean Gaussian random variables of dimensionality 1, 2, and n (the size of the warp parameter vector) respectively. Between them they are intended to represent uncertainties which are identical at each pixel (for example, camera noise), uncertainty which increases when contra
[1]
David J. Fleet,et al.
Performance of optical flow techniques
,
1994,
International Journal of Computer Vision.
[2]
Stefano Soatto,et al.
Real-time feature tracking and outlier rejection with changes in illumination
,
2001,
Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
[3]
Andrew Zisserman,et al.
Multiple View Geometry
,
1999
.
[4]
Michel Dhome,et al.
Real Time Robust Template Matching
,
2002,
BMVC.
[5]
Simon Baker,et al.
Lucas-Kanade 20 Years On: A Unifying Framework
,
2004,
International Journal of Computer Vision.
[6]
Carlo Tomasi,et al.
Good features to track
,
1994,
1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
[7]
Emanuele Trucco,et al.
Making good features track better
,
1998,
Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).
[8]
Takeo Kanade,et al.
An Iterative Image Registration Technique with an Application to Stereo Vision
,
1981,
IJCAI.