Multi-view Object Categorization and Pose Estimation

Object and scene categorization has been a central topic of computer vision research in recent years. The problem is a highly challenging one. A single object may show tremendous variability in appearance and structure under various photometric and geometric conditions. In addition, members of the same class may differ from each other due to various degrees of intra-class variability. Recently, researchers have proposed new models towards the goal of: i) finding a suitable representation that can efficiently capture the intrinsic three-dimensional and multi-view nature of object categories; ii) taking advantage of this representation to help the recognition and categorization task. In this Chapter we will review recent approaches aimed at tackling this challenging problem and focus on the work by Savarese & Fei-Fei [54, 55]. In [54, 55] multi-view object models are obtained by linking together diagnostic parts of the objects from different viewing point. Instead of recovering a full 3D geometry, parts are connected through their mutual homographic transformation. The resulting model is a compact summarization of both the appearance and geometry information of the object class. We show that such a model can be learnt via minimal supervision compared to competitive techniques. The model can be used to detect objects under arbitrary and/or unseen poses by means of a two-step algorithm. This algorithm, inspired by works in single object view synthesis (e.g., Seitz & Dyer [57]), has the ability to synthesize object appearance and shape properties at recognition time, and in turn estimate the object pose that best matches the observations.We conclude this Chapter by presenting experiments on detection, recognition and pose estimation results with respect to two datasets in [54,55] as well as to PASCAL Visual Object Classes (VOC) dataset [15]. Experiments indicate that representation and algorithms presented in [54,55] can be successfully employed in a number of generic object recognition tasks.

[1]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[3]  Shimon Ullman,et al.  View-Invariant Recognition Using Corresponding Object Fragments , 2004, ECCV.

[4]  Tomas Lozano-Perez,et al.  Recognition and Localization of Overlapping Parts in Two and Three Dimensions , 1985 .

[5]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[6]  Bernd Neumann,et al.  Computer Vision — ECCV’98 , 1998, Lecture Notes in Computer Science.

[7]  Linda G. Shapiro,et al.  A new signature-based method for efficient 3-D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Silvio Savarese,et al.  View Synthesis for Recognizing Unseen Poses of Object Classes , 2008, ECCV.

[9]  Ali Farhadi,et al.  A latent model of discriminative aspect , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  G. Sandini,et al.  Computer Vision — ECCV'92 , 1992, Lecture Notes in Computer Science.

[12]  Hiroshi Murase,et al.  Learning by a generation approach to appearance-based object recognition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[13]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[14]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[15]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18]  Azriel Rosenfeld,et al.  3-D Shape Recovery Using Distributed Aspect Matching , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, CVPR.

[20]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[21]  Matthew A. Brown,et al.  Unsupervised 3D object recognition and reconstruction in unordered datasets , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[22]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[23]  Ronen Basri,et al.  Constructing implicit 3D shape models for pose estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Stan Z. Li,et al.  FloatBoost learning and statistical face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[26]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[27]  J. Koenderink,et al.  The singularities of the visual mapping , 1976, Biological Cybernetics.

[28]  David A. Forsyth,et al.  Canonical Frames for Planar Object Recognition , 1992, ECCV.

[29]  Remco C. Veltkamp,et al.  A Survey of Content Based 3D Shape Retrieval Methods , 2004, SMI.

[30]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[31]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[32]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[33]  XINJU LI,et al.  Feature-Based Alignment of Range Scan Data to CAD Model , 2007, Int. J. Shape Model..

[34]  Leslie Pack Kaelbling,et al.  Virtual Training for Multi-View Object Class Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Takeo Kanade,et al.  A robust shape model for multi-view car alignment , 2009, CVPR.

[36]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views , 2006, International Journal of Computer Vision.

[37]  M. Everingham The PASCAL Visual Object Classes Challenge 2005 Development Kit , 2005 .

[38]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[39]  Kevin W. Bowyer,et al.  Creating The Perspective Projection Aspect Graph Of Polyhedral Objects , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[40]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[41]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[43]  Kevin W. Bowyer,et al.  Aspect graphs: An introduction and survey of recent results , 1990, Int. J. Imaging Syst. Technol..

[44]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[45]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[46]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[47]  Takeo Kanade,et al.  A statistical approach to 3d object detection applied to faces and cars , 2000 .

[48]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[49]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[50]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Hiroshi Murase,et al.  Real-time 100 object recognition system , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[52]  J. Koenderink,et al.  The internal representation of solid shape with respect to vision , 1979, Biological Cybernetics.

[53]  Jianxiong Xiao,et al.  Structuring Visual Words in 3D for Arbitrary-View Object Localization , 2008, ECCV.

[54]  Remco C. Veltkamp,et al.  A survey of content based 3D shape retrieval methods , 2004, Proceedings Shape Modeling Applications, 2004..

[55]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[56]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[57]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[58]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[59]  Alexander J. Smola,et al.  Learning Graph Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[61]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[62]  Pietro Perona,et al.  Viewpoint-invariant learning and detection of human heads , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[63]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.

[64]  Alfred O. Hero,et al.  Unsupervised Object Pose Classification from Short Video Sequences , 2009, BMVC.

[65]  Dmitry B. Goldgof,et al.  The scale space aspect graph , 1992, CVPR.

[66]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[67]  Bernt Schiele,et al.  Scale-Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search , 2004, DAGM-Symposium.

[68]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[69]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[71]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[72]  Geoffrey E. Hinton,et al.  Learning Generative Texture Models with extended Fields-of-Experts , 2009, BMVC.

[73]  Benjamin B. Kimia,et al.  A Similarity-Based Aspect-Graph Approach to 3D Object Recognition , 2004, International Journal of Computer Vision.

[74]  Bernt Schiele,et al.  3D object recognition from range images using local feature histograms , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[75]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Kevin W. Bowyer,et al.  Computing the Perspective Projection Aspect Graph of Solids of Revolution , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[78]  Shaogang Gong,et al.  Multi-view face detection and pose estimation using a composite support vector machine across the view sphere , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[79]  Mubarak Shah,et al.  3D Model based Object Class Detection in An Arbitrary View , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[80]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..