We show how to determine the position of a camera relative to a model of the environment using features extracted from one image. The method is based on view variation of cross-ratios and is independent of camera calibration. 1 Model Indexing From Images Recognition and pose estimation of objects in images is in general based on extracting features such as points and lines from the image and finding corresponding features in model objects. The combinatorial complexity of matching image descriptors to models can be considerable. A way to reduce this is to use index tables where image descriptors are used as indexes to a table containing corresponding model descriptors. These tables can be constructed off-line which means that space is traded off for time [1] [2] [3] [4]. Another factor affecting the complexity of recognition is the fact that the image of an object, depends on the viewpoint of the camera. This has initiated work in finding feature descriptors invariant to viewpoint [1] [2]. For configurations of points and lines invariants can in general be found only when they are contained in a planar surface. For general configurations of points and lines, any descriptor computed from image data will vary with viewpoint. [3] [5] . Since camera viewpoint in 3D is parameterized by 6 parameters, variation of image descriptors will in general be 6-dimensional. It is however possible to choose descriptors that have far less dimensionality of variation than 6. In the case when imaging is approximated by an affine mapping, it has been shown [3] [4] that it is possible to compute 4 image descriptors from 4 points that exhibit only 2-dimensional variation of viewpoint. For a specific configuration of 4 points, the 4-D image descriptor will be contained in a 2-D surface. In this work we will consider the more general case of imaging by perspective mapping. Since one of the main applications we have in mind is moving platform navigation, perspective effects can be substantial. In projective and perspective transformations the cross ratio is a fundamental invariant of point and line sets. We will show that for 6 points in an image, 4 cross ratios can be computed. These 4 cross ratios will exhibit 3-dimensional variation with viewpoint. This is a very simple result due to the fact that the cross ratios are invariant to rotations of the camera around the point of projection. The variation is therefore due entirely to camera translation. In the limit of large relative viewing distances the image mapping will become parallel and the dependence of image data on camera position *Adclress: NADA, KTH, S-100 44 Stockholm, Sweden Email: stefanc@bion.kth.se BMVC 1993 doi:10.5244/C.7.12
[1]
J.B. Burns,et al.
View Variation of Point-Set and Line-Segment Features
,
1993,
IEEE Trans. Pattern Anal. Mach. Intell..
[2]
D. Jacobs.
Space Efficient 3D Model Indexing
,
1992
.
[3]
George C. Stockman,et al.
Object recognition and localization via pose clustering
,
1987,
Comput. Vis. Graph. Image Process..
[4]
Yehezkel Lamdan,et al.
Object recognition by affine invariant matching
,
2011,
Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.
[5]
David A. Forsyth,et al.
Efficient model library access by projectively invariant indexing functions
,
1992,
Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.