On the Design and Analysis of Multiple View Descriptors

We propose an extension of popular descriptors based on gradient orientation histograms (HOG, computed in a single image) to multiple views. It hinges on interpreting HOG as a conditional density in the space of sampled images, where the effects of nuisance factors such as viewpoint and illumination are marginalized. However, such marginalization is performed with respect to a very coarse approximation of the underlying distribution. Our extension leverages on the fact that multiple views of the same scene allow separating intrinsic from nuisance variability, and thus afford better marginalization of the latter. The result is a descriptor that has the same complexity of single-view HOG, and can be compared in the same manner, but exploits multiple views to better trade off insensitivity to nuisance variability with specificity to intrinsic variability. We also introduce a novel multi-view wide-baseline matching dataset, consisting of a mixture of real and synthetic objects with ground truthed camera motion and dense three-dimensional geometry.

[1]  Stefano Soatto,et al.  Tales of shape and radiance in multiview stereo , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[3]  Olivier D. Faugeras,et al.  Variational principles, surface evolution, PDEs, level set methods, and the stereo problem , 1998, IEEE Trans. Image Process..

[4]  Michael I. Miller,et al.  REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[5]  P. Lions,et al.  Axioms and fundamental equations of image processing , 1993 .

[6]  Horst Bischof,et al.  Object Recognition based on local feature trajectories , 2005 .

[7]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[8]  Huiyuan Sun,et al.  Computer Vision Recognition of Incomplete Symbols in Russian Symbols , 2014 .

[9]  Stefano Soatto,et al.  Knowing a Good Feature When You See It: Ground Truth and Methodology to Evaluate Local Features for Recognition , 2010, Computer Vision: Detection, Recognition and Reconstruction.

[10]  Stefano Soatto,et al.  Detachable Object Detection: Segmentation and Depth Ordering from Short-Baseline Video , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[13]  Robert R. Bitmead,et al.  Persistence of excitation conditions and the convergence of adaptive schemes , 1984, IEEE Trans. Inf. Theory.

[14]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Stefano Soatto,et al.  Viewpoint Induced Deformation Statistics and the Design of Viewpoint Invariant Features: Singularities and Occlusions , 2006, ECCV.

[16]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[17]  Stefano Soatto,et al.  Visual-inertial navigation, mapping and localization: A scalable real-time causal approach , 2011, Int. J. Robotics Res..

[18]  Rakesh Gupta,et al.  Multiple View Feature Descriptors from Image Sequences via Kernel Principal Component Analysis , 2004, ECCV.

[19]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Rahul Sukthankar,et al.  D-Nets: Beyond patch-based image descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Stefano Soatto,et al.  On the set of images modulo viewpoint and contrast changes , 2009, CVPR.

[22]  A. Rau Variational Principles , 2021, Classical Mechanics.

[23]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[24]  Stéphane Mallat,et al.  Classification with scattering operators , 2010, CVPR 2011.

[25]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[26]  Horst Bischof,et al.  Online 3D reconstruction using convex optimization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[27]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[28]  Stefano Soatto,et al.  Multi-View Stereo Reconstruction of Dense Shape and Complex Appearance , 2005, International Journal of Computer Vision.

[29]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[30]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[31]  Stefano Soatto,et al.  Steps Towards a Theory of Visual Information: Active Perception, Signal-to-Symbol Conversion and the Interplay Between Sensing and Control , 2011, ArXiv.

[32]  Stefano Soatto,et al.  Video-based descriptors for object recognition , 2011, Image Vis. Comput..

[33]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[34]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors Based on 3D Objects , 2005, ICCV.

[36]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[37]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[40]  Brian D. Ripley,et al.  Statistics on Spheres , 1983 .

[41]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[43]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..