Learning Image Descriptors with Boosting

We propose a novel and general framework to learn compact but highly discriminative floating-point and binary local feature descriptors. By leveraging the boosting-trick we first show how to efficiently train a compact floating-point descriptor that is very robust to illumination and viewpoint changes. We then present the main contribution of this paper—a binary extension of the framework that demonstrates the real advantage of our approach and allows us to compress the descriptor even further. Each bit of the resulting binary descriptor, which we call BinBoost, is computed with a boosted binary hash function, and we show how to efficiently optimize the hash functions so that they are complementary, which is key to compactness and robustness. As we do not put any constraints on the weak learner configuration underlying each hash function, our general framework allows us to optimize the sampling patterns of recently proposed hand-crafted descriptors and significantly improve their performance. Moreover, our boosting scheme can easily adapt to new applications and generalize to other types of image data, such as faces, while providing state-of-the-art results at a fraction of the matching time and memory footprint.

[1]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[2]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[4]  Horesh Ben Shitrit Multi-camera face detection and recognition applied to people tracking , 2013 .

[5]  Vincent Lepetit,et al.  BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yuncai Liu,et al.  Monocular Template-Based Tracking of Inextensible Deformable Surfaces under L2-Norm , 2009, ACCV.

[7]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[8]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[9]  Shih-Fu Chang,et al.  Sequential Projection Learning for Hashing with Compact Codes , 2010, ICML.

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Rabab Kreidieh Ward,et al.  A Fast Approximate Nearest Neighbor Search Algorithm in the Hamming Space , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[15]  Gregory Shakhnarovich,et al.  Learning task-specific similarity , 2005 .

[16]  Ya Zhang,et al.  Boosted multi-task learning , 2010, Machine Learning.

[17]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[18]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[20]  Jian Sun,et al.  Face recognition with learning-based descriptor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Xiaogang Wang,et al.  Boosted multi-task learning for face verification with applications to web image and video search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[24]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[26]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[28]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[29]  Vincent Lepetit,et al.  Learning Image Descriptors with the Boosting-Trick , 2012, NIPS.

[30]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[31]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[32]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Vincent Lepetit,et al.  Efficient Discriminative Projections for Compact Binary Descriptors , 2012, ECCV.

[34]  Antonio Torralba,et al.  Multidimensional Spectral Hashing , 2012, ECCV.

[35]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[37]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[38]  LiuYuncai,et al.  Monocular 3-D tracking of inextensible deformable surfaces under L2-norm , 2010 .

[39]  Yannick Boursier,et al.  Sparsity Driven People Localization with a Heterogeneous Network of Cameras , 2011, Journal of Mathematical Imaging and Vision.

[40]  Vincent Lepetit,et al.  Boosting Binary Keypoint Descriptors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[42]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[43]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[44]  Andrew Zisserman,et al.  Descriptor Learning Using Convex Optimisation , 2012, ECCV.

[45]  Pascal Fua,et al.  A Real-Time Deformable Detector , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  C. Lawrence Zitnick,et al.  Binary Coherent Edge Descriptors , 2010, ECCV.

[47]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.