Distance Metric Between 3D Models and 2D Images for Recognition and Classification

Similarity measurements between 3D objects and 2D images are useful for the tasks of object recognition and classification. The authors distinguish between two types of similarity metrics: metrics computed in image-space (image metrics) and metrics computed in transformation-space (transformation metrics). Existing methods typically use image metrics; namely, metrics that measure the difference in the image between the observed image and the nearest view of the object. Example for such a measure is the Euclidean distance between feature points in the image and their corresponding points in the nearest view. (This measure can be computed by solving the exterior orientation calibration problem.) In this paper the authors introduce a different type of metrics: transformation metrics. These metrics penalize for the deformations applied to the object to produce the observed image. In particular, the authors define a transformation metric that optimally penalizes for "affine deformations" under weak-perspective. A closed-form solution, together with the nearest view according to this metric, are derived. The metric is shown to be equivalent to the Euclidean image metric, in the sense that they bound each other from both above and below. It therefore provides an easy-to-use closed-form approximation for the commonly-used least-squares distance between models and images. The authors demonstrate an image understanding application, where the true dimensions of a photographed battery charger are estimated by minimizing the transformation metric.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Ronen Basri,et al.  Recognition by prototypes , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[3]  T. Q. Phong,et al.  Optimal estimation of object pose from a single perspective view , 1993, 1993 (4th) International Conference on Computer Vision.

[4]  Larry S. Davis,et al.  Model-Based Object Pose in 25 Lines of Code , 1992, ECCV.

[5]  Radu Horaud,et al.  An analytic solution for the perspective 4-point problem , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  W. Eric L. Grimson,et al.  Recognizing 3D objects from 2D images: an error analysis , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Joseph S.-C. Yuan A general photogrammetric method for determining object position and orientation , 1989, IEEE Trans. Robotics Autom..

[8]  Daphna Weinshall,et al.  Distance metric between 3D models and 2D images for recognition and classification , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..