Large Scale Online Learning of Image Similarity through Ranking

Learning a measure of similarity between pairs of objects is a fundamental problem in machine learning. Pairwise similarity plays a crucial role in classification algorithms like nearest neighbors, and is practically important for applications like searching for images that are similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are both visually similar and semantically related to a given object. Unfortunately, current approaches for learning semantic similarity are limited to small scale datasets, because their complexity grows quadratically with the sample size, and because they impose costly positivity constraints on the learned similarity functions. To address real-world large-scale AI problem, like learning similarity over all images on the web, we need to develop new algorithms that scale to many samples, many classes, and many features. The current abstract presents OASIS, an Online Algorithm for Scalable Image Similarity learning that learns a bilinear similarity measure over sparse representations. OASIS is an online dual approach using the passive-aggressive family of learning algorithms with a large margin criterion and an efficient hinge loss cost. Our experiments show that OASIS is both fast and accurate at a wide range of scales: for a dataset with thousands of images, it achieves better results than existing state-of-the-art methods, while being an order of magnitude faster. Comparing OASIS with different symmetric variants, provides unexpected insights into the effect of symmetry on the quality of the similarity. For large, web scale, datasets, OASIS can be trained on more than two million images from 150K text queries within two days on a single CPU. Human evaluations showed that 35% of the ten top images ranked by OASIS were semantically relevant to a query image. This suggests that query-independent similarity could be accurately learned even for large-scale datasets that could not be handled before.

[1]  A. Tversky Features of Similarity , 1977 .

[2]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[3]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[4]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[6]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[7]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[8]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[9]  R. Manmatha,et al.  Using Maximum Entropy for Automatic Image Annotation , 2004, CIVR.

[10]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[13]  Matti Pietikäinen,et al.  Block-Based Methods for Image Retrieval Using Local Binary Patterns , 2005, SCIA.

[14]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[15]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[17]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Glenn Fung,et al.  Learning sparse metrics via linear programming , 2006, KDD '06.

[19]  Samy Bengio,et al.  Learning to Retrieve Images from Text Queries with a Discriminative Model , 2006, Adaptive Multimedia Retrieval.

[20]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[21]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[23]  Prateek Jain,et al.  Fast image search for learned metrics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[25]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Nuno Vasconcelos,et al.  A study of query by semantic example , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[28]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[29]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..