Full-Space Local Topology Extraction for Cross-Modal Retrieval

With the ever increasing availability of various kinds of multimedia data, cross-modal retrieval, which enables information retrieval from various types of data given various types of query, has become a research hotspot. Hashing-based techniques have been developed to solve this problem, however, most previous works cannot capture the shared underlying structure of real-world multimodal data, which degrades their retrieval performances. In this paper, we propose a novel hashing method based on the extraction of the common manifold structure shared among different feature spaces. To faithfully represent the common structure, two kinds of local topology information are exploited in our method. Local angles are incorporated within the extraction of local topology of each feature space, which is then used to learn a common intermediate subspace. After heterogeneous features being embedded into this subspace, local similarities are exploited to extract the local topology between different feature spaces, and learn compact Hamming embeddings to facilitate cross-modal retrieval. The proposed method is referred to as full-space local topology extraction for hashing. Extensive comparisons with other state-of-the-art methods on three benchmark multimedia data sets demonstrate the superiority of our proposed method in terms of retrieval recall and search accuracy.

[1]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[2]  Jian Pei,et al.  Parallel field alignment for cross media retrieval , 2013, ACM Multimedia.

[3]  Anton van den Hengel,et al.  Large-Margin Learning of Compact Binary Image Encodings , 2014, IEEE Transactions on Image Processing.

[4]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[5]  Vassilios Morellas,et al.  Efficient Nearest Neighbors via Robust Sparse Hashing , 2014, IEEE Transactions on Image Processing.

[6]  Yoshua Bengio,et al.  Nonlocal Estimation of Manifold Structure , 2006, Neural Computation.

[7]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Hongyuan Zha,et al.  Adaptive Manifold Learning , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[10]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[12]  Seungjin Choi,et al.  Sequential Spectral Learning to Hash with Multiple Representations , 2012, ECCV.

[13]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[14]  Yunqian Ma,et al.  Manifold Learning Theory and Applications , 2011 .

[15]  Xianglong Liu,et al.  Multiple feature kernel hashing for large-scale visual search , 2014, Pattern Recognit..

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[17]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[18]  Zi Huang,et al.  A Sparse Embedding and Least Variance Encoding Approach to Hashing , 2014, IEEE Transactions on Image Processing.

[19]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[20]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Shih-Fu Chang,et al.  Query-Adaptive Image Search With Hash Codes , 2013, IEEE Transactions on Multimedia.

[22]  Yi Yang,et al.  Spline Regression Hashing for Fast Image Search , 2012, IEEE Transactions on Image Processing.

[23]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[24]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[25]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[26]  P. Bahr,et al.  Sampling: Theory and Applications , 2020, Applied and Numerical Harmonic Analysis.

[27]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[28]  Fei Wang,et al.  Composite hashing with multiple information sources , 2011, SIGIR.

[29]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Vishal Monga,et al.  Robust Video Hashing via Multilinear Subspace Projections , 2012, IEEE Transactions on Image Processing.

[31]  Yi Yang,et al.  Image Attribute Adaptation , 2014, IEEE Transactions on Multimedia.

[32]  Yongdong Zhang,et al.  A Prior-Free Weighting Scheme for Binary Code Ranking , 2014, IEEE Transactions on Multimedia.

[33]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[34]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[35]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[36]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Qi Tian,et al.  Coupled Binary Embedding for Large-Scale Image Retrieval , 2014, IEEE Transactions on Image Processing.

[38]  Yongdong Zhang,et al.  Scalable Similarity Search With Topology Preserving Hashing , 2014, IEEE Transactions on Image Processing.

[39]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[40]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[41]  WangJun,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012 .

[42]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[43]  Qinghua Hu,et al.  What Can We Learn about Motion Videos from Still Images? , 2014, ACM Multimedia.

[44]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[45]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[46]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.