Supervised Distributed Hashing for Large-Scale Multimedia Retrieval

Recent years have witnessed the growing popularity of hashing for large-scale multimedia retrieval. Extensive hashing methods have been designed for data stored in a single machine, that is, centralized hashing . In many real-world applications, however, the large-scale data are often distributed across different locations, servers, or sites. Although hashing for distributed data can be implemented by assembling all distributed data together as a whole dataset in theory, it usually leads to prohibitive computation, communication, and storage costs in practice. Up to now, only a few methods were tailored for distributed hashing, which are all unsupervised approaches. In this paper, we propose an efficient and effective method called supervised distributed hashing (SupDisH), which learns discriminative hash functions by leveraging the semantic label information in a distributed manner. Specifically, we cast the distributed hashing problem into the framework of classification, where the learned binary codes are expected to be distinct enough for semantic retrieval. By introducing auxiliary variables, the distributed model is then separated into a set of decentralized subproblems with consistency constraints, which can be solved in parallel on each vertex of the distributed network. As such, we can obtain high-quality distinctive unbiased binary codes and consistent hash functions with low computational complexity, which facilitate tackling large-scale multimedia retrieval tasks involving distributed datasets. Experimental evaluations on three large-scale datasets show that SupDisH is competitive to centralized hashing methods and outperforms the state-of-the-art unsupervised distributed method significantly.

[1]  Don Coppersmith,et al.  On the Asymptotic Complexity of Matrix Multiplication , 1982, SIAM J. Comput..

[2]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xianming Liu,et al.  Random Walk Graph Laplacian-Based Smoothness Prior for Soft Decoding of JPEG Images , 2016, IEEE Transactions on Image Processing.

[4]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Nicu Sebe,et al.  Supervised Hashing with Pseudo Labels for Scalable Multimedia Retrieval , 2015, ACM Multimedia.

[7]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[8]  Nicu Sebe,et al.  Quantization-based hashing: a general framework for scalable image and video retrieval , 2018, Pattern Recognit..

[9]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[10]  Zi Huang,et al.  Robust Hashing With Local Models for Approximate Similarity Search , 2014, IEEE Transactions on Cybernetics.

[11]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[13]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[14]  Stephen P. Boyd,et al.  Block splitting for distributed optimization , 2013, Mathematical Programming Computation.

[15]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[16]  Svetha Venkatesh,et al.  Distributed query processing for mobile surveillance , 2007, ACM Multimedia.

[17]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[18]  Xianglong Liu,et al.  DisITQ: A Distributed Iterative Quantization Hashing Learning Algorithm , 2016, 2016 9th International Symposium on Computational Intelligence and Design (ISCID).

[19]  Wen Gao,et al.  Parametric local multiview hamming distance metric learning , 2018, Pattern Recognit..

[20]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[21]  Heng Tao Shen,et al.  Deep Region Hashing for Efficient Large-scale Instance Search from Images , 2017, ArXiv.

[22]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2015, ICASSP.

[23]  Xi Zhang,et al.  Hashing for Distributed Data , 2015, ICML.

[24]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[25]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[26]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wei Liu,et al.  Scalable similarity search with optimized kernel hashing , 2010, KDD.

[28]  David Suter,et al.  Fast Supervised Hashing with Decision Trees for High-Dimensional Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[30]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[32]  Per Christian Hansen,et al.  Regularization methods for large-scale problems , 1993 .

[33]  Lei Wu,et al.  Compact projection: Simple and efficient near neighbor search with practical memory requirements , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[35]  Meng Wang,et al.  Neighborhood Discriminant Hashing for Large-Scale Image Retrieval , 2015, IEEE Transactions on Image Processing.

[36]  Gaurav S. Sukhatme,et al.  Mobile Sensor Network Deployment using Potential Fields : A Distributed , Scalable Solution to the Area Coverage Problem , 2002 .

[37]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[38]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[39]  Wen Gao,et al.  Parametric Local Multimodal Hashing for Cross-View Similarity Search , 2013, IJCAI.

[40]  Venkatesh Saligrama,et al.  Efficient Training of Very Deep Neural Networks for Supervised Hashing , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[42]  Ashish Goel,et al.  Efficient distributed locality sensitive hashing , 2012, CIKM.

[43]  Yongdong Zhang,et al.  Scalable Similarity Search With Topology Preserving Hashing , 2014, IEEE Transactions on Image Processing.

[44]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[45]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[46]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[47]  Shih-Fu Chang,et al.  Sequential Projection Learning for Hashing with Compact Codes , 2010, ICML.

[48]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[49]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[50]  Sangwoon Yun,et al.  On the Iteration Complexity of Cyclic Coordinate Gradient Descent Methods , 2014, SIAM J. Optim..

[51]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[52]  Zi Huang,et al.  Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.