How to measure the pose robustness of object views

The viewing hemisphere of a three-dimensional object can be partitioned into areas of similar views, which provide pose robustness. We compare two procedures for measuring the robustness of views to pose variation: tracking of object features, i.e. Gabor wavelet responses, by utilizing the continuity of successive views and matching of features in different views, which are assumed to be independent. Both procedures proved to be appropriate to detect canonical views. We found no difference concerning the size of the view bubbles, but tracking provides more precise correspondences than matching. Tracking is more appropriate for recognizing changes of features, whereas matching is more suitable if features of the same appearance are to be found. 1. Subject of investigation Many models have been proposed for three-dimensional object perception. Besides volume-based object representations , which seem to be very economical but often require the interaction from a user to acquire them, as for example, described in Ref. [1], many computational models combine two-dimensional views into the equivalent of a three-dimensional object representation. Examples are the manifold approach applied [2±4] and the recognition of three-dimensional objects utilizing support vector machines [5,6]. Among the different models for three-dimensional object perception, the notion of a canonical view is a prominent topic. It can be regarded as a view which is easier to recognize than other views of the same object. A hard de®nition does not exist, even its properties are controversial. Palmer et al. [7] describe canonical views as the ones that ªhumans ®nd easiest to recognize and regard as most typicalº. Open questions concerning canonical views are the number of views necessary for different visual tasks and their statistical distribution on the viewing sphere. Malik and Whangbo [8], for instance, have demonstrated that a uniform distribution is inappropriate. Weinshall and Werman [9] have shown that the likelihood to observe a certain view of an object correlates with the view's robustness against pose variation, i.e. how little the image changes when the viewpoint is slightly changed. The most likely views are often thè¯at-test' views of an object. For pose-invariant object recognition and pose estimation of objects, it is necessary to utilize an appropriate object representation. An obvious but naive representation might consist of densely spaced views of an object's viewing sphere. Our aim is to reduce such a `full' representation to only some representative views and the relations between them. Such a sparse representation belongs to …

[1]  Daphna Weinshall,et al.  A self-organizing multiple-view representation of 3D objects , 2004, Biological Cybernetics.

[2]  Hanspeter A. Mallot,et al.  Phase-based binocular vergence control and depth reconstruction using active vision , 1994 .

[3]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[4]  David J. Fleet,et al.  Computation of component image velocity from local phase information , 1990, International Journal of Computer Vision.

[5]  Allen M. Waxman,et al.  Adaptive 3-D Object Recognition from Multiple Views , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[7]  Gérard G. Medioni,et al.  Interactive 3D model extraction from a single image , 2001, Image Vis. Comput..

[8]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[9]  Rolf P. Würtz,et al.  Object Recognition Robust Under Translations, Deformations, and Changes in Background , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Jochen Triesch,et al.  GripSee: A Gesture-Controlled Robot for Object Perception and Manipulation , 1999, Auton. Robots.

[11]  Laurenz Wiskott,et al.  Labeled graphs and dynamic link matching for face recognition and scene analysis , 1995 .

[12]  Barbara Zitová,et al.  A Comparative Evaluation of Matching and Tracking Object Features for the Purpose of Estimating Similar-View-Areas of 3-Dimensional Objects , 1999 .

[13]  Christoph von der Malsburg,et al.  Tracking and learning graphs and pose on image sequences of faces , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[14]  P. Kellman Perception of three-dimensional form by human infants , 1984, Perception & psychophysics.

[15]  Norbert Krüger,et al.  Face recognition by elastic bunch graph matching , 1997, Proceedings of International Conference on Image Processing.

[16]  A. J. Mistlin,et al.  Visual neurones responsive to faces , 1987, Trends in Neurosciences.

[17]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[18]  Michael Werman,et al.  On View Likelihood and Stability , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Bernhard Schölkopf,et al.  Comparison of View-Based Object Recognition Algorithms Using Realistic 3D Models , 1996, ICANN.

[20]  Raashid Malik,et al.  Angle Densities and Recognition of 3D Objects , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Hartmut Neven,et al.  The Bochum/USC Face Recognition System And How it Fared in the FERET Phase III Test , 1998 .

[22]  Jan C. Vorbrüggen Zwei Modelle zur datengetriebenen Segmentierung visueller Daten , 1995 .