Models for multi-view object class detection

Learning how to detect objects from many classes in a wide variety of viewpoints is a key goal of computer vision. Existing approaches, however, require excessive amounts of training data. Implementors need to collect numerous training images not only to cover changes in the same object's shape due to the viewpoint variation, but also to accommodate the variability in appearance among instances of the same class. We introduce the Potemkin model, which exploits the relationship between 3D objects and their 2D projections for efficient and effective learning. The Potemkin model can be constructed from a few views of an object of the target class. We use the Potemkin model to transform images of objects from one view to several other views, effectively multiplying their value for class detection. This approach can be coupled with any 2D image-based detection system. We show that automatically transformed images dramatically decrease the data requirements for multi-view object class detection. The Potemkin model also allows detection systems to reconstruct the 3D shapes of detected objects automatically from a single 2D image. This reconstruction generates realistic views of 3D models, and also provides accurate 3D information for entire objects. We demonstrate its usefulness in three applications: robot manipulation, object detection using 2.5D data, and generating 3D ‘pop-up’ models from photos. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Frédo Durand,et al.  Billboard clouds for extreme model simplification , 2003, ACM Trans. Graph..

[2]  Mubarak Shah,et al.  3D Model based Object Class Detection in An Arbitrary View , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Mubarak Shah,et al.  Tri-view morphing , 2004, Comput. Vis. Image Underst..

[5]  Andrew Zisserman,et al.  Extending Pictorial Structures for Object Recognition , 2004, BMVC.

[6]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Matthew A. Brown,et al.  Unsupervised 3D object recognition and reconstruction in unordered datasets , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[8]  Antonio Torralba,et al.  Graphical Model For Recognizing Scenes and Objects. , 2003, NIPS 2003.

[9]  Pietro Perona,et al.  Viewpoint-invariant learning and detection of human heads , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[10]  R. Hartley,et al.  PowerFactorization : 3D reconstruction with missing or uncertain data , 2003 .

[11]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1998 .

[12]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Andrew W. Fitzgibbon,et al.  Fast and Controllable 3D Modelling From Silhouettes , 2005, Eurographics.

[16]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[17]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views , 2006, International Journal of Computer Vision.

[18]  Refractor Vision , 2000, The Lancet.

[19]  Leslie Pack Kaelbling,et al.  Virtual Training for Multi-View Object Class Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Richard Szeliski,et al.  Video mosaics for virtual environments , 1996, IEEE Computer Graphics and Applications.

[21]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[22]  Andrew Zisserman,et al.  OBJ CUT , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  J. Lien,et al.  MULTI-VIEW FACE DETECTION AND POSE ESTIMATION , 2005 .

[24]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[25]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[26]  James J. Kuffner,et al.  OpenRAVE: A Planning Architecture for Autonomous Robotics , 2008 .

[27]  Li Zhang,et al.  Single view modeling of free-form scenes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[28]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[29]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[31]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[32]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Shimon Ullman,et al.  View-Invariant Recognition Using Corresponding Object Fragments , 2004, ECCV.

[34]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[35]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[36]  Tsuhan Chen,et al.  Pictorial structures for object recognition and part labeling in drawings , 2011, 2011 18th IEEE International Conference on Image Processing.

[37]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[38]  I. Biederman,et al.  Priming contour-deleted images: Evidence for intermediate representations in visual object recognition , 1991, Cognitive Psychology.

[39]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[41]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Jitendra Malik,et al.  Shape contexts enable efficient retrieval of similar shapes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[43]  Bao-Liang Lu,et al.  Fast recognition of multi-view faces with feature selection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[44]  Sami Romdhani,et al.  Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[46]  Silvio Savarese,et al.  View Synthesis for Recognizing Unseen Poses of Object Classes , 2008, ECCV.

[47]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[48]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[49]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[50]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[51]  Trevor Darrell,et al.  Virtual Visual Hulls: Example-Based 3D Shape Inference from Silhouettes , 2004, ECCV Workshop SMVP.

[52]  Daniel P. Huttenlocher,et al.  Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[53]  Stephen J. Maybank,et al.  A Method for Interactive 3D Reconstruction of Piecewise Planar Objects from Single Images , 1999, BMVC.

[54]  A. Pentland Recognition by Parts , 1987 .

[55]  Daniel P. Huttenlocher,et al.  Composite Models of Objects and Scenes for Category Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[57]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Daphne Koller,et al.  Learning Object Shape: From Drawings to Images , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[59]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[60]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[61]  Stan Z. Li,et al.  FloatBoost learning and statistical face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Chunming Li,et al.  Level set evolution without re-initialization: a new variational formulation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[64]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[65]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.

[66]  Ken-ichi Anjyo,et al.  Tour into the picture: using a spidery mesh interface to make animation from a single image , 1997, SIGGRAPH.

[67]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[68]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[69]  Andrew W. Fitzgibbon,et al.  Single View Reconstruction of Curved Surfaces , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[70]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[71]  Ashutosh Saxena,et al.  Learning 3-D Scene Structure from a Single Still Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[72]  Andrew Zisserman,et al.  Identifying individuals in video by combining 'generative' and discriminative head models , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.