Inverting and Visualizing Features for Object Detection

Abstract : This paper presents methods to visualize feature spaces commonly used in object detection. The tools in this paper allow a human to put on feature space glasses and see the visual world as a computer might see it. We found that these glasses allow us to gain insight into the behavior of computer vision systems. We show a variety of experiments with our visualizations, such as examining the linear separability of recognition in HOG space, generating high scoring super objects for an object detector, and diagnosing false positives. We pose the visualization problem as one of feature inversion, i.e. recovering the natural image that generated a feature descriptor. We describe four algorithms to tackle this task, with different trade-offs in speed accuracy, and scalability. Our most successful algorithm uses ideas from sparse coding to learn a pair of dictionaries that enable regression between HOG features and natural images, and can invert features at interactive rates. We believe these visualizations are useful tools to add to an object detector researcher's toolbox, and code is available.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Antonio Torralba,et al.  Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Horst Bischof,et al.  Real-Time Tracking via On-line Boosting , 2006, BMVC.

[6]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[7]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[8]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[10]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[12]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[13]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[15]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  C. Lawrence Zitnick,et al.  The role of features, algorithms and data in visual recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[19]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[20]  Benjamin B. Kimia,et al.  Exploring the representation capabilities of the HOG descriptor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[21]  J. Gallant,et al.  Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies , 2011, Current Biology.

[22]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[23]  Patrick Pérez,et al.  Reconstructing an image from its local descriptors , 2011, CVPR 2011.

[24]  Devi Parikh Human-Debugging of Machines , 2011 .

[25]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[26]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[27]  Quan Pan,et al.  Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Charless C. Fowlkes,et al.  Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[29]  Lei Wang,et al.  What has my classifier learned? Visualizing the classification rules of bag-of-feature model by support region detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[31]  Andrew Zisserman,et al.  Latent SVMs for Human Detection with a Locally Affine Deformation Field , 2012, BMVC.

[32]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[33]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Pierre Vandergheynst,et al.  Beyond bits: Reconstructing images from Local Binary Descriptors , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[35]  Jitendra Malik,et al.  Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[36]  Alexei A. Efros,et al.  How Important Are "Deformable Parts" in the Deformable Parts Model? , 2012, ECCV Workshops.