Shape-based object recognition in videos using 3D synthetic object models

In this paper we address the problem of recognizing moving objects in videos by utilizing synthetic 3D models. We use only the silhouette space of the synthetic models making thus our approach independent of appearance. To deal with the decrease in discriminability in the absence of appearance, we align sequences of object masks from video frames to paths in silhouette space. We extract object silhouettes from video by an integration of feature tracking, motion grouping of tracks, and co-segmentation of successive frames. Subsequently, the object masks from the video are matched to 3D model silhouettes in a robust matching and alignment phase. The result is a matching score for every 3D model to the video, along with a pose alignment of the model to the video. Promising experimental results indicate that a purely shape-based matching scheme driven by synthetic 3D models can be successfully applied for object recognition in videos.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  William Grimson,et al.  Object recognition by computer - the role of geometric constraints , 1991 .

[4]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Cordelia Schmid,et al.  3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Serge J. Belongie,et al.  What went where , 2003, CVPR 2003.

[10]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[11]  J. Koenderink,et al.  The internal representation of solid shape with respect to vision , 1979, Biological Cybernetics.

[12]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Benjamin B. Kimia,et al.  A Similarity-Based Aspect-Graph Approach to 3D Object Recognition , 2004, International Journal of Computer Vision.

[14]  T. Tuytelaars,et al.  Integrating multiple model views for object recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Andrew Zisserman,et al.  Object Level Grouping for Video Shots , 2004, International Journal of Computer Vision.

[16]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[17]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Jianbo Shi,et al.  Image Matching via Saliency Region Correspondences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Andrew Zisserman,et al.  Efficient Visual Search for Objects in Videos , 2008, Proceedings of the IEEE.

[20]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.