Learning the Topology of Object Views

A visual representation of an object must meet at least three basic requirements. First, it must allow identification of the object in the presence of slight but unpredictable changes in its visual appearance. Second, it must account for larger changes in appearance due to variations in the object's fundamental degrees of freedom, such as, e.g., changes in pose. And last, any object representation must be derivable from visual input alone, i.e., it must be learnable.We here construct such a representation by deriving transformations between the different views of a given object, so that they can be parameterized in terms of the object's physical degrees of freedom. Our method allows to automatically derive the appearance representations of an object in conjunction with their linear deformation model from example images. These are subsequently used to provide linear charts to the entire appearance manifold of a three-dimensional object. In contrast to approaches aiming at mere dimensionality reduction the local linear charts to the object's appearance manifold are estimated on a strictly local basis avoiding any reference to a metric embedding space to all views. A real understanding of the object's appearance in terms of its physical degrees of freedom is this way learned from single views alone.

[1]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[2]  Jan Wieghardt,et al.  Pose-Independent Object Representation by 2-D Views , 2000, Biologically Motivated Computer Vision.

[3]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[4]  Andrea Salgian,et al.  Minimally supervised acquisition of 3D recognition models from cluttered images , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  van der Arjan Schaft,et al.  Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society, 1998 , 1998 .

[6]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[7]  Joshua B. Tenenbaum,et al.  Mapping a Manifold of Perceptual Observations , 1997, NIPS.

[8]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[9]  Jochen Triesch,et al.  GripSee: A Gesture-Controlled Robot for Object Perception and Manipulation , 1999, Auton. Robots.

[10]  E. Kefalea Object localization and recognition for a grasping robot , 1998, IECON '98. Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society (Cat. No.98CH36200).

[11]  Andrea Salgian,et al.  Appearance-based object recognition using multiple views , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[13]  George A. Bekey,et al.  On autonomous robots , 1998, The Knowledge Engineering Review.

[14]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[15]  Christoph von der Malsburg,et al.  Analysis, synthesis and recognition of human faces with pose variations , 2001 .