SALSAS: Sub-linear active learning strategy with approximate k-NN search

With the democratization of digital imaging devices, image databases exponentially grow. Thus, providing the user with a system for searching into these databases is a critical issue. However, bridging the semantic gap between which (semantic) concept(s) the user is looking for and the (semantic) content is quite difficult. In content-based image retrieval (CBIR) systems, a classic scenario is to formulate the user query, at first, with only one example (i.e. one image). In order to address this problem, active learning is a powerful technique which involves the user in interactively refining the query concept, through relevance feedback loops, by asking the user whether some strategically selected images are relevant or not. However, the complexity of state-of-the-art active learning methods is linear in the size of the database and thus dramatically slows down retrieval systems, when dealing with very large databases, which is no longer acceptable for users. In this article, we propose a strategy to overcome scalability limitations of active learning strategies by exploiting ultra fast k-nearest-neighbor (k-NN) methods, as locality sensitive hashing (LSH), and combining them with an active learning strategy dedicated to very large databases. We define a new LSH scheme adapted to @g^2 distance which often leads to better results in image retrieval context. We perform evaluation on databases between 5K and 180K images. The results show that our interactive retrieval system has a complexity almost constant in the size of the database. For a database of 180K images, our system is 45 times faster than exhaustive search (linear scan) reaching similar accuracy.

[1]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[2]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[3]  Edward Y. Chang,et al.  Support Vector Machine Concept-Dependent Active Learning for Image Retrieval , 2005 .

[4]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[5]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[6]  Jing Peng,et al.  Kernel Vector Approximation Files for Relevance Feedback Retrieval in Large Image Databases , 2005, Multimedia Tools and Applications.

[7]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[8]  Edward Y. Chang,et al.  Statistical learning for effective visual information retrieval , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Lei Wang,et al.  A novel framework for SVM-based image retrieval on large databases , 2005, MULTIMEDIA '05.

[11]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[12]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[13]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[14]  Edward Y. Chang,et al.  Active Learning for Interactive Multimedia Retrieval , 2008, Proceedings of the IEEE.

[15]  Matthieu Cord,et al.  Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval , 2008, Comput. Vis. Image Underst..

[16]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[17]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[18]  Michel Crucianu,et al.  Hyperplane Queries in a Feature-Space M-tree for Speeding up Active Learning , 2007, BDA.

[19]  Zhi-Hua Zhou,et al.  Enhancing relevance feedback in image retrieval using unlabeled data , 2006, ACM Trans. Inf. Syst..

[20]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[21]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[22]  Edward Y. Chang,et al.  Exploiting Geometry for Support Vector Machine Indexing , 2005, SDM.

[23]  Jing Peng,et al.  Kernel VA-files for relevance feedback retrieva , 2003, MMDB '03.

[24]  Jing Peng,et al.  Kernel indexing for relevance feedback image retrieval , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[25]  Rong Jin,et al.  Semi-supervised SVM batch mode active learning for image retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Matthieu Cord,et al.  Optimization on active learning strategy for object category retrieval , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[27]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[28]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[29]  Thomas S. Huang,et al.  Relevance feedback in image retrieval: A comprehensive review , 2003, Multimedia Systems.

[30]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[31]  Kongqiao Wang,et al.  Active learning for image retrieval with Co-SVM , 2007, Pattern Recognit..

[32]  Matthieu Cord,et al.  Active Learning Methods for Interactive Image Retrieval , 2008, IEEE Transactions on Image Processing.

[33]  David R. Bull,et al.  Projective image restoration using sparsity regularization , 2013, 2013 IEEE International Conference on Image Processing.

[34]  Edward Y. Chang,et al.  Active learning in very large databases , 2006, Multimedia Tools and Applications.

[35]  Matthieu Cord,et al.  Fast approximate kernel-based similarity search for image retrieval task , 2008, 2008 19th International Conference on Pattern Recognition.

[36]  Edward Y. Chang,et al.  Efficient top-k hyperplane query processing for multimedia information retrieval , 2006, MM '06.

[37]  Matthieu Cord,et al.  Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval , 2009, J. Electronic Imaging.