论文信息 - Video-based descriptors for object recognition

Video-based descriptors for object recognition

We describe a visual recognition system operating on a hand-held device, based on a video-based feature descriptor, and characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system recognizes objects in the field of view based on their ranking. Severe resource constraints have prompted a re-evaluation of existing algorithms improving their performance (accuracy and robustness) as well as computational efficiency. We motivate the design choices in the implementation with a characterization of the stability properties of local invariant detectors, and of the conditions under which a template-based descriptor is optimal. The analysis also highlights the role of time as ''weak supervisor'' during training, which we exploit in our implementation.

Stefano Soatto | Taehee Lee | Stefano Soatto | Taehee Lee

[1] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2] Carlo Tomasi,et al. Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[3] Roberto Cipolla,et al. Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[5] Stefano Soatto,et al. Actionable information in vision , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6] 徐梦溪,et al. Network video monitoring system based on OpenCV (open source computer vision library) , 2011 .

[7] Stefano Soatto,et al. Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8] Stefano Soatto,et al. Region Matching with Missing Parts , 2002, ECCV.

[9] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[10] Jesse Freeman,et al. in Morse theory, , 1999 .

[11] Stefano Soatto,et al. Occlusion Detection and Motion Estimation with Convex Optimization , 2010, NIPS.

[12] Simon Baker,et al. Equivalence and efficiency of image alignment algorithms , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13] Herbert Edelsbrunner,et al. Topological persistence and simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[14] Leonidas J. Guibas,et al. Persistence-based clustering in riemannian manifolds , 2011, SoCG '11.

[15] Tony Lindeberg,et al. Principles for Automatic Scale Selection , 1999 .

[16] Stefano Soatto,et al. Tales of shape and radiance in multiview stereo , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17] Stefano Soatto,et al. On the set of images modulo viewpoint and contrast changes , 2009, CVPR.

[18] Stefano Soatto,et al. Multi-View Stereo Reconstruction of Dense Shape and Complex Appearance , 2005, International Journal of Computer Vision.

[19] Tosiyasu L. Kunii,et al. Surface coding based on Morse theory , 1991, IEEE Computer Graphics and Applications.

[20] R. Bajcsy. Active perception , 1988 .

[21] Yiannis Aloimonos,et al. Active vision , 2004, International Journal of Computer Vision.

[22] Paul A. Viola,et al. Robust Real-time Object Detection , 2001 .

[23] Christian P. Robert,et al. The Bayesian choice , 1994 .

[24] R. Smullyan. ANNALS OF MATHEMATICS STUDIES , 1961 .

[25] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[26] Stefano Soatto,et al. A Complexity-Distortion Approach to Joint Pattern Alignment , 2006, NIPS.

[27] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[28] Tom Drummond,et al. Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[29] Vincent Lepetit,et al. Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Stefano Soatto,et al. Deformotion: Deforming Motion, Shape Average and the Joint Registration and Approximation of Structures in Images , 2003, International Journal of Computer Vision.

[31] D. Mumford,et al. Stochastic models for generic images , 2001 .

[32] Stefano Soatto,et al. Dynamic Shape and Appearance Models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Stefano Soatto,et al. Controlled Recognition Bounds for Scaling and Occlusion Channels , 2011, 2011 Data Compression Conference.

[34] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[35] Pietro Perona,et al. Evaluation of Features Detectors and Descriptors Based on 3D Objects , 2005, ICCV.

[36] Vincent Lepetit,et al. A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Stefano Soatto,et al. DEFORMOTION: Deforming Motion, Shape Average and the Joint Registration and Segmentation of Images , 2002, ECCV.

[38] N. Wiener. The Fourier Integral: and certain of its Applications , 1933, Nature.

[39] Stefano Soatto,et al. Dynamic Shape and Appearance Modeling Via Moving and Deforming Layers , 2005, EMMCVPR.

[40] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41] David Cohen-Steiner,et al. Stability of Persistence Diagrams , 2005, Discret. Comput. Geom..

[42] Tom Drummond,et al. Multiple Target Localisation at over 100 FPS , 2009, BMVC.

[43] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[44] Jitendra Malik,et al. Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[45] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[46] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[47] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[48] L. Peletier,et al. On the location of defects in stationary solutions of the Ginzburg-Landau equation in R 2 , 1996 .

[49] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[50] B. Dundas,et al. DIFFERENTIAL TOPOLOGY , 2002 .