Online visual vocabulary pruning using pairwise constraints

Given a pair of images represented using bag-of-visual-words and a label corresponding to whether the images are “related”(must-link constraint) or “unrelated” (cannot-link constraint), we address the problem of selecting a subset of visual words that are salient in explaining the relation between the image pair. In particular, a subset of features is selected such that the distance computed using these features satisfies the given pairwise constraints. An efficient online feature selection algorithm is presented based on the dual-gradient descent approach. Side information in the form of pair-wise constraints is incorporated into the feature selection stage, providing the user with flexibility to use an unsupervised or semi-supervised algorithm at a later stage. Correlated subsets of visual words, usually resulting from hierarchical quantization process (called groups), are exploited to select a significantly smaller vocabulary. A group-LASSO regularizer is used to drive as many feature weights to zero as possible. We evaluate the quality of the pruned vocabulary by clustering the data using the resulting feature subset. Experiments on PASCAL VOC 2007 dataset using 5000 visual keywords, resulted in around 80% reduction in the number of keywords, with little or no loss in performance.

[1]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[2]  H. Zou,et al.  The F ∞ -norm support vector machine , 2008 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[5]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[6]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[12]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[13]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[14]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[15]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[19]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[20]  P. Zhao Boosted Lasso , 2004 .

[21]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.