Deep Listwise Triplet Hashing for Fine-Grained Image Retrieval

Hashing is a practical approach for the approximate nearest neighbor search. Deep hashing methods, which train deep networks to generate compact and similarity-preserving binary codes for entities (e.g. images), have received lots of attention in the information retrieval community. A representative stream of deep hashing methods is triplet-based hashing that learns hashing models from triplets of data. The existing triplet-based hashing methods only consider triplets that are in the form of <inline-formula> <tex-math notation="LaTeX">$(q,q^{+},q^{-})$ </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$q$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$q^{+}$ </tex-math></inline-formula> are in the same class and <inline-formula> <tex-math notation="LaTeX">$q$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$q^{-}$ </tex-math></inline-formula> are in different classes. However, the number of possible triplets is approximately the cube of training examples, triplets used in the existing methods are only a small fraction of all possible triplets. This motivates us to develop a new triplet-based hashing method that adopts many more triplets in training phase. We propose Deep Listwise Triplet Hashing (DLTH) that introduces more triplets into batch-based training and a novel listwise triplet loss to capture the relative similarity in new triplets. This method has a pipeline of two steps. In Step 1, we propose a novel way to generate triplets from the soft class labels obtained by knowledge distillation module, where the triplets in the form of <inline-formula> <tex-math notation="LaTeX">$(q,q^{+},q^{-})$ </tex-math></inline-formula> are a subset of the newly obtained triplets. In Step 2, we develop a novel listwise triplet loss to train the hashing network, which seeks to capture the relative similarity between images in triplets according to soft labels. We conduct comprehensive image retrieval experiments on four benchmark datasets. The experimental results show that the proposed method has superior performances over state-of-the-art baselines.