Video-based descriptors for object recognition

We describe a visual recognition system operating on a hand-held device, based on a video-based feature descriptor, and characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system recognizes objects in the field of view based on their ranking. Severe resource constraints have prompted a re-evaluation of existing algorithms improving their performance (accuracy and robustness) as well as computational efficiency. We motivate the design choices in the implementation with a characterization of the stability properties of local invariant detectors, and of the conditions under which a template-based descriptor is optimal. The analysis also highlights the role of time as ''weak supervisor'' during training, which we exploit in our implementation.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[5]  Stefano Soatto,et al.  Actionable information in vision , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  徐梦溪,et al.  Network video monitoring system based on OpenCV (open source computer vision library) , 2011 .

[7]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Stefano Soatto,et al.  Region Matching with Missing Parts , 2002, ECCV.

[9]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[10]  Jesse Freeman,et al.  in Morse theory, , 1999 .

[11]  Stefano Soatto,et al.  Occlusion Detection and Motion Estimation with Convex Optimization , 2010, NIPS.

[12]  Simon Baker,et al.  Equivalence and efficiency of image alignment algorithms , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  Herbert Edelsbrunner,et al.  Topological persistence and simplification , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[14]  Leonidas J. Guibas,et al.  Persistence-based clustering in riemannian manifolds , 2011, SoCG '11.

[15]  Tony Lindeberg,et al.  Principles for Automatic Scale Selection , 1999 .

[16]  Stefano Soatto,et al.  Tales of shape and radiance in multiview stereo , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Stefano Soatto,et al.  On the set of images modulo viewpoint and contrast changes , 2009, CVPR.

[18]  Stefano Soatto,et al.  Multi-View Stereo Reconstruction of Dense Shape and Complex Appearance , 2005, International Journal of Computer Vision.

[19]  Tosiyasu L. Kunii,et al.  Surface coding based on Morse theory , 1991, IEEE Computer Graphics and Applications.

[20]  R. Bajcsy Active perception , 1988 .

[21]  Yiannis Aloimonos,et al.  Active vision , 2004, International Journal of Computer Vision.

[22]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[23]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[24]  R. Smullyan ANNALS OF MATHEMATICS STUDIES , 1961 .

[25]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[26]  Stefano Soatto,et al.  A Complexity-Distortion Approach to Joint Pattern Alignment , 2006, NIPS.

[27]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[28]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[29]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Stefano Soatto,et al.  Deformotion: Deforming Motion, Shape Average and the Joint Registration and Approximation of Structures in Images , 2003, International Journal of Computer Vision.

[31]  D. Mumford,et al.  Stochastic models for generic images , 2001 .

[32]  Stefano Soatto,et al.  Dynamic Shape and Appearance Models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Stefano Soatto,et al.  Controlled Recognition Bounds for Scaling and Occlusion Channels , 2011, 2011 Data Compression Conference.

[34]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[35]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors Based on 3D Objects , 2005, ICCV.

[36]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Stefano Soatto,et al.  DEFORMOTION: Deforming Motion, Shape Average and the Joint Registration and Segmentation of Images , 2002, ECCV.

[38]  N. Wiener The Fourier Integral: and certain of its Applications , 1933, Nature.

[39]  Stefano Soatto,et al.  Dynamic Shape and Appearance Modeling Via Moving and Deforming Layers , 2005, EMMCVPR.

[40]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2005, Discret. Comput. Geom..

[42]  Tom Drummond,et al.  Multiple Target Localisation at over 100 FPS , 2009, BMVC.

[43]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[44]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[45]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[46]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[47]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[48]  L. Peletier,et al.  On the location of defects in stationary solutions of the Ginzburg-Landau equation in R 2 , 1996 .

[49]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[50]  B. Dundas,et al.  DIFFERENTIAL TOPOLOGY , 2002 .