ProLFA: Representative Prototype Selection for Local Feature Aggregation

Given a set of hand-crafted local features, acquiring a global representation via aggregation is a promising technique to boost computational efficiency and improve task performance. Existing feature aggregation (FA) approaches, including Bag of Words and Fisher Vectors, usually fail to capture the desired information due to their pipeline mode. In this paper, we propose a generic formulation to provide a systematical solution (named ProLFA) to aggregate local descriptors. It is capable of producing compact yet interpretable representations by selecting representative prototypes from numerous descriptors, under relaxed exclusivity constraint. Meanwhile, to strengthen the discriminability of the aggregated representation, we rationally enforce the domain-invariant projection of bundled descriptors along a task-specific direction. Furthermore, ProLFA is also provided with a powerful generalization ability to deal flexibly with the semi-supervised and fully supervised scenarios in local feature aggregation. Experimental results on various descriptors and tasks demonstrate that the proposed ProLFA is considerably superior over currently available alternatives about feature aggregation.

[1]  Jian Sun,et al.  Sparse-Coded Features for Image Retrieval , 2013, BMVC.

[2]  Alexandros Iosifidis,et al.  Discriminant Bag of Words based representation for human action recognition , 2014, Pattern Recognit. Lett..

[3]  Hugo Larochelle,et al.  An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.

[4]  Richard H. Bartels,et al.  Algorithm 432 [C2]: Solution of the matrix equation AX + XB = C [F4] , 1972, Commun. ACM.

[5]  Frédéric Jurie,et al.  Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[6]  Anastasios Tefas,et al.  Entropy Optimized Feature-Based Bag-of-Words Representation for Information Retrieval , 2016, IEEE Transactions on Knowledge and Data Engineering.

[7]  Christos Diou,et al.  Learning local feature aggregation functions with backpropagation , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[8]  Ryutarou Ohbuchi,et al.  Aggregating Sparse Binarized Local Features by Summing for Efficient 3D Model Retrieval , 2016, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM).

[9]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Florent Perronnin,et al.  Fisher vectors meet Neural Networks: A hybrid classification architecture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Ryutarou Ohbuchi,et al.  Diffusion-on-Manifold Aggregation of Local Features for Shape-based 3D Model Retrieval , 2015, ICMR.

[13]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[14]  Subhransu Maji,et al.  Second-order Democratic Aggregation , 2018, ECCV.

[15]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[17]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[18]  Yao Zhao,et al.  Self-Supervised Deep Low-Rank Assignment Model for Prototype Selection , 2018, IJCAI.

[19]  Raja Giryes,et al.  Autoencoders , 2020, ArXiv.

[20]  Xiaojie Guo,et al.  Low-Rank Matrix Recovery Via Robust Outlier Estimation , 2018, IEEE Transactions on Image Processing.

[21]  Zi Huang,et al.  Multi-attention Network for One Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Tieniu Tan,et al.  Feature Coding in Image Classification: A Comprehensive Study , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Xin Li,et al.  Adaptive Active Learning for Image Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[25]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[26]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[27]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[29]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[30]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[31]  Gang Hua,et al.  Picking the best DAISY , 2009, CVPR.

[32]  Feiping Nie,et al.  Exclusive Feature Learning on Arbitrary Structures via \ell_{1, 2}-norm , 2014, NIPS.

[33]  Christian Wolf,et al.  Supervised Learning and Codebook Optimization for Bag-of-Words Models , 2012, Cognitive Computation.

[34]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[36]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[37]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Anastasios Tefas,et al.  Learning bag-of-embedded-words representations for textual information retrieval , 2018, Pattern Recognit..

[40]  S. Shankar Sastry,et al.  Dissimilarity-Based Sparse Subset Selection , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Ehsan Elhamifar,et al.  Subset Selection and Summarization in Sequential Data , 2017, NIPS.

[42]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[44]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[45]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[48]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[49]  David Picard,et al.  Improving image similarity with vectors of locally aggregated tensors , 2011, 2011 18th IEEE International Conference on Image Processing.

[50]  Zhenfeng Zhu,et al.  Seeing All From a Few: $\ell_{1}$ -Norm-Induced Discriminative Prototype Selection , 2019, IEEE Transactions on Neural Networks and Learning Systems.