Learning to generate novel views of objects for class recognition

Multi-view object class recognition can be achieved using existing approaches for single-view object class recognition, by treating different views as entirely independent classes. This strategy requires a large amount of training data for many viewpoints, which can be costly to obtain. We describe a method for constructing a weak three-dimensional model from as few as two views of an object of the target class, and using that model to transform images of objects from one view to several other views, effectively multiplying their value for class recognition. Our approach can be coupled with any 2D image-based recognition system. We show that automatically transformed images dramatically decrease the data requirements for multi-view object class recognition.

[1]  Matthew A. Brown,et al.  Unsupervised 3D object recognition and reconstruction in unordered datasets , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[2]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Leslie Pack Kaelbling,et al.  Virtual Training for Multi-View Object Class Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  I. Biederman,et al.  Priming contour-deleted images: Evidence for intermediate representations in visual object recognition , 1991, Cognitive Psychology.

[7]  Andrew Zisserman,et al.  Identifying individuals in video by combining 'generative' and discriminative head models , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[9]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[10]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views , 2006, International Journal of Computer Vision.

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[13]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[14]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Shimon Ullman,et al.  View-Invariant Recognition Using Corresponding Object Fragments , 2004, ECCV.

[16]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[17]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Refractor Vision , 2000, The Lancet.

[19]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[20]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[22]  Pietro Perona,et al.  Viewpoint-invariant learning and detection of human heads , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[23]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  J. Lien,et al.  MULTI-VIEW FACE DETECTION AND POSE ESTIMATION , 2005 .

[26]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[28]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Jitendra Malik,et al.  Shape contexts enable efficient retrieval of similar shapes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[30]  Bao-Liang Lu,et al.  Fast recognition of multi-view faces with feature selection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[31]  Daniel P. Huttenlocher,et al.  Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[32]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[33]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[34]  Stan Z. Li,et al.  FloatBoost learning and statistical face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[37]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[38]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[39]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[40]  A. Pentland Recognition by Parts , 1987 .

[41]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[42]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[43]  Daniel P. Huttenlocher,et al.  Composite Models of Objects and Scenes for Category Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  R. Hartley,et al.  PowerFactorization : 3D reconstruction with missing or uncertain data , 2003 .

[46]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[47]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[48]  Shaogang Gong,et al.  Multi-view face detection and pose estimation using a composite support vector machine across the view sphere , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[49]  Mubarak Shah,et al.  3D Model based Object Class Detection in An Arbitrary View , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[50]  Prasanna G. Mulgaonkar,et al.  Matching 'sticks, plates and blobs' objects using geometric and relational constraints , 1984, Image and Vision Computing.

[51]  Andrew Zisserman,et al.  Extending Pictorial Structures for Object Recognition , 2004, BMVC.

[52]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).