Heterogeneous Hashing Network for Face Retrieval Across Image and Video Domains

In this paper, we present a heterogeneous hashing network to generate effective and compact hash representations of both face images and face videos for face retrieval across image and video domains. The network contains an image branch and a video branch to project face images and videos into a common space, respectively. Then, the non-linear hash functions are learned in the common space to obtain the corresponding binary hash representations. The network is trained with three loss functions: 1) the Fisher loss; 2) the softmax loss; and 3) the triplet ranking loss. The Fisher loss uses the difference form of within-class and between-class scatter and is appropriate for the mini-batch-based optimization method. The Fisher loss together with the softmax loss is exploited to enhance the discriminative power of the common space. The triplet ranking loss is enforced on the final binary hash representations to improve retrieval performance. Experiments on a large-scale face video dataset and two challenging TV-series datasets demonstrate the effectiveness of the proposed method.

[1]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, International Journal of Computer Vision.

[2]  Jeongnyeo Kim,et al.  Face Image Retrieval Using Sparse Representation Classifier with Gabor-LBP Histogram , 2010, WISA.

[3]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[4]  Jonghyun Choi,et al.  Predictable Dual-View Hashing , 2013, ICML.

[5]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[6]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Ying He,et al.  Retrieval-Based Face Annotation by Weak Label Regularized Local Coordinate Coding , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[10]  Shuicheng Yan,et al.  Toward Large-Population Face Identification in Unconstrained Videos , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Trevor Darrell,et al.  Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Lei Wang,et al.  Beyond Covariance: Feature Representation with Nonlinear Kernel Matrices , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[15]  Wu-Jun Li,et al.  Feature Learning Based Deep Supervised Hashing with Pairwise Labels , 2015, IJCAI.

[16]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Zi Huang,et al.  Binary Subspace Coding for Query-by-Image Video Retrieval , 2016, ArXiv.

[19]  Dongqing Zhang,et al.  Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[21]  Shiguang Shan,et al.  Hierarchical hybrid statistic based video binary code and its application to face retrieval in TV-series , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[22]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[23]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[24]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[25]  Jiwen Lu,et al.  Deep Video Hashing , 2017, IEEE Transactions on Multimedia.

[26]  Likun Huang,et al.  Face recognition based on image sets , 2014 .

[27]  Wen Gao,et al.  Parametric Local Multimodal Hashing for Cross-View Similarity Search , 2013, IJCAI.

[28]  Anil K. Jain,et al.  Face Search at Scale , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Xiaogang Wang,et al.  Sparsifying Neural Network Connections for Face Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Zhen Dong,et al.  Face Video Retrieval via Deep Learning of Binary Hash Representations , 2016, AAAI.

[31]  Ajmal S. Mian,et al.  Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[32]  Shiguang Shan,et al.  Two Birds, One Stone: Jointly Learning Binary Code for Large-Scale Face Image Retrieval and Attributes Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Shiguang Shan,et al.  Compact Video Code and Its Application to Robust Face Retrieval in TV-Series , 2014, BMVC.

[34]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ming Yang,et al.  Web-scale training for face identification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[37]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Stan Sclaroff,et al.  Adaptive Hashing for Fast Similarity Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Jiwen Lu,et al.  Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Shiguang Shan,et al.  Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Harry Shum,et al.  Scalable face image retrieval with identity-based quantization and multi-reference re-ranking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Trevor Darrell,et al.  Face Recognition from Long-Term Observations , 2002, ECCV.

[43]  Yan-Ying Chen,et al.  Semi-supervised face image retrieval using sparse coding with identity constraint , 2011, ACM Multimedia.

[44]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[45]  Rama Chellappa,et al.  Kernel Learning for Extrinsic Classification of Manifold Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Chu-Song Chen,et al.  Face Recognition and Retrieval Using Cross-Age Reference Coding With Cross-Age Celebrity Dataset , 2015, IEEE Transactions on Multimedia.

[47]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Bir Bhanu,et al.  Discrete Cosine Transform Locality-Sensitive Hashes for Face Retrieval , 2014, IEEE Transactions on Multimedia.

[50]  Shiguang Shan,et al.  Spatial Pyramid Covariance-Based Compact Video Code for Robust Face Retrieval in TV-Series , 2016, IEEE Transactions on Image Processing.

[51]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[52]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Bin Deng,et al.  Name-face association with web facial image supervision , 2017, Multimedia Systems.

[55]  Xiao-Yuan Jing,et al.  Learning Heterogeneous Dictionary Pair with Feature Projection Matrix for Pedestrian Video Retrieval via Single Query Image , 2017, AAAI.

[56]  Dacheng Tao,et al.  Multilinear Hyperplane Hashing , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[58]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Jürgen Schmidhuber,et al.  Multimodal Similarity-Preserving Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.