3-D Object Recognition Using 2-D Views

We consider the problem of recognizing 3-D objects from 2-D images using geometric models and assuming different viewing angles and positions. Our goal is to recognize and localize instances of specific objects (i.e., model-based) in a scene. This is in contrast to category-based object recognition methods where the goal is to search for instances of objects that belong to a certain visual category (e.g., faces or cars). The key contribution of our work is improving 3-D object recognition by integrating algebraic functions of views (AFoVs), a powerful framework for predicting the geometric appearance of an object due to viewpoint changes, with indexing and learning. During training, we compute the space of views that groups of object features can produce under the assumption of 3-D linear transformations, by combining a small number of reference views that contain the object features using AFoVs. Unrealistic views (e.g., due to the assumption of 3-D linear transformations) are eliminated by imposing a pair of rigidity constraints based on knowledge of the transformation between the reference views of the object. To represent the space of views that an object can produce compactly while allowing efficient hypothesis generation during recognition, we propose combining indexing with learning in two stages. In the first stage, we sample the space of views of an object sparsely and represent information about the samples using indexing. In the second stage, we build probabilistic models of shape appearance by sampling the space of views of the object densely and learning the manifold formed by the samples. Learning employs the expectation-maximization (EM) algorithm and takes place in a ldquouniversal,rdquo lower-dimensional, space computed through random projection (RP). During recognition, we extract groups of point features from the scene and we use indexing to retrieve the most feasible model groups that might have produced them (i.e., hypothesis generation). The likelihood of each hypothesis is then computed using the probabilistic models of shape appearance. Only hypotheses ranked high enough are considered for further verification with the most likely hypotheses verified first. The proposed approach has been evaluated using both artificial and real data, illustrating promising performance. We also present preliminary results illustrating extentions of the AFoVs framework to predict the intensity appearance of an object. In this context, we have built a hybrid recognition framework that exploits geometric knowledge to hypothesize the location of an object in the scene and both geometrical and intensity information to verify the hypotheses.

[1]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[2]  Zehang Sun,et al.  Object detection using feature subset selection , 2004, Pattern Recognit..

[3]  W. Eric L. Grimson,et al.  Localizing Overlapping Parts by Searching the Interpretation Tree , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Thomas M. Breuel Indexing for Visual Recognition from a Large Model Base , 1990 .

[5]  Mubarak Shah,et al.  Indexing Based on Algebraic Functions of Views , 1998, Comput. Vis. Image Underst..

[6]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[7]  Mark Zwolinski,et al.  Mutual Information Theory for Adaptive Mixture Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[9]  Pascal Fua,et al.  Computational strategies for object recognition , 1992, CSUR.

[10]  H H Bülthoff,et al.  Psychophysical support for a two-dimensional view interpolation theory of object recognition. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  S. Edelman,et al.  Stimulus Familiarity Determines Recognition Strategy for Novel 3D Objects , 1989 .

[13]  Glenn Healey,et al.  The Illumination-Invariant Recognition of 3D Objects Using Local Color Invariants , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Ronen Basri,et al.  Recognition by prototypes , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[15]  David W. Jacobs Matching 3-D Models to 2-D Images , 2004, International Journal of Computer Vision.

[16]  Shaogang Gong,et al.  Tracking colour objects using adaptive mixture models , 1999, Image Vis. Comput..

[17]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[18]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[19]  David W. Jacobs,et al.  Space and Time Bounds on Indexing 3D Models from 2D Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[21]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[22]  Sushil J. Louis,et al.  Genetic object recognition using combinations of views , 2002, IEEE Trans. Evol. Comput..

[23]  David J. Kriegman,et al.  What is the set of images of an object under all possible lighting conditions? , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[25]  Robert F. Sproull,et al.  Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[26]  David G. Lowe,et al.  Indexing without Invariants in 3D Object Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Zehang Sun,et al.  On-road vehicle detection using evolutionary Gabor filter optimization , 2005, IEEE Transactions on Intelligent Transportation Systems.

[28]  Clark F. Olson Probabilistic Indexing for Object Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[30]  Mubarak Shah,et al.  Using algebraic functions of views for indexing-based object recognition , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[31]  Ehud Rivlin,et al.  Localization and Homing Using Combinations of Model Views , 1995, Artif. Intell..

[32]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[33]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[34]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[35]  David W. Jacobs,et al.  Robust and Efficient Detection of Salient Convex Groups , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Rakesh Mohan,et al.  Multidimensional Indexing for Recognizing Visual Shapes , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Kevin W. Bowyer,et al.  Aspect graphs: An introduction and survey of recent results , 1990, Int. J. Imaging Syst. Technol..

[39]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[41]  Ronen Basri,et al.  Clustering appearances of 3D objects , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[42]  Charles R. Dyer,et al.  Model-based recognition in robot vision , 1986, CSUR.

[43]  J.B. Burns,et al.  View Variation of Point-Set and Line-Segment Features , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Yehezkel Lamdan,et al.  Affine invariant model-based object recognition , 1990, IEEE Trans. Robotics Autom..

[45]  Joseph L. Mundy,et al.  Object Recognition in the Geometric Era: A Retrospective , 2006, Toward Category-Level Object Recognition.

[46]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Mircea Nicolescu,et al.  An Automatic Framework for Figure-Ground Segmentation in Cluttered Backgrounds , 2007, BMVC.

[48]  Ronen Basri,et al.  Recognition Using Region Correspondences , 1997, International Journal of Computer Vision.

[49]  Nikolaos G. Bourbakis,et al.  Integrating Algebraic Functions of Views with Indexing and Learning for 3D Object Recognition , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[50]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[51]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  David A. Forsyth,et al.  Planar object recognition using projective shape representation , 1995, International Journal of Computer Vision.

[53]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[54]  Ronen Basri,et al.  3-D to 2-D Pose Determination with Regions , 1999, International Journal of Computer Vision.

[55]  Michael Georgiopoulos,et al.  Using self-organizing maps to learn geometric hash functions for model-based object recognition , 1998, IEEE Trans. Neural Networks.

[56]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[58]  Václav Hlavác,et al.  Selection of reference views for image-based representation , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[59]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[60]  Thomas O. Binford,et al.  Survey of Model-Based Image Analysis Systems , 1982 .

[61]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[63]  Joseph L. Mundy,et al.  Towards the Integration of Geometric and Appearance-Based Object Recognition , 1999, Shape, Contour and Grouping in Computer Vision.

[64]  Olivier D. Faugeras,et al.  What can two images tell us about a third one? , 1994, ECCV.

[65]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[66]  Ivaylo Riskov,et al.  Components for Object Detection and Identification , 2006, Toward Category-Level Object Recognition.

[67]  Mubarak Shah,et al.  Learning affine transformations of the plane for model-based object recognition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[68]  Michael Werman,et al.  On View Likelihood and Stability , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Thomas Serre,et al.  A Component-based Framework for Face Detection and Identification , 2007, International Journal of Computer Vision.

[70]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[71]  Michael A. Malcolm,et al.  Computer methods for mathematical computations , 1977 .

[72]  Hans P. Moravec Rover Visual Obstacle Avoidance , 1981, IJCAI.

[73]  Mubarak Shah,et al.  Learning affine transformations , 1999, Pattern Recognit..

[74]  M. Georgiopoulos,et al.  Learning orthographic transformations for object recognition , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[75]  Shimon Ullman,et al.  Recognizing solid objects by alignment with an image , 1990, International Journal of Computer Vision.

[76]  Luc Van Gool,et al.  Affine/ Photometric Invariants for Planar Intensity Patterns , 1996, ECCV.

[77]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[78]  Bernt Schiele,et al.  Object Recognition Using Multidimensional Receptive Field Histograms , 1996, ECCV.

[79]  Mircea Nicolescu,et al.  An iterative multi-scale tensor voting scheme for perceptual grouping of natural shapes in cluttered backgrounds , 2009, Comput. Vis. Image Underst..

[80]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[81]  Heinrich H. Bülthoff,et al.  Psychophysical support for a 2D view interpolation theory of object recognition , 1991 .

[82]  Amnon Shashua,et al.  Algebraic Functions For Recognition , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[84]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[85]  E. Hansen,et al.  Interval Arithmetic in Matrix Computations, Part II , 1965 .

[86]  Ronen Basri,et al.  The Alignment Of Objects With Smooth Surfaces , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[87]  J. Koenderink,et al.  The singularities of the visual mapping , 1976, Biological Cybernetics.