Graph PCA Hashing for Similarity Search

This paper proposes a new hashing framework to conduct similarity search via the following steps: first, employing linear clustering methods to obtain a set of representative data points and a set of landmarks of the big dataset; second, using the landmarks to generate a probability representation for each data point. The proposed probability representation method is further proved to preserve the neighborhood of each data point. Third, PCA is integrated with manifold learning to lean the hash functions using the probability representations of all representative data points. As a consequence, the proposed hashing method achieves efficient similarity search (with linear time complexity) and effective hashing performance and high generalization ability (simultaneously preserving two kinds of complementary similarity structures, i.e., local structures via manifold learning and global structures via PCA). Experimental results on four public datasets clearly demonstrate the advantages of our proposed method in terms of similarity search, compared to the state-of-the-art hashing methods.

[1]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[2]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[4]  Dimitris N. Metaxas,et al.  Large-Scale medical image analytics: Recent methodologies, applications and Future directions , 2016, Medical Image Anal..

[5]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[6]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[7]  Antonio Torralba,et al.  Multidimensional Spectral Hashing , 2012, ECCV.

[8]  Zi Huang,et al.  A Sparse Embedding and Least Variance Encoding Approach to Hashing , 2014, IEEE Transactions on Image Processing.

[9]  Shih-Fu Chang,et al.  Locally Linear Hashing for Extracting Non-linear Manifolds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[11]  Dinggang Shen,et al.  Fast Neuroimaging-Based Retrieval for Alzheimer's Disease Analysis , 2016, MLMI@MICCAI.

[12]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[13]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[14]  Yilong Yin,et al.  Spherical torus-based video hashing for near-duplicate video detection , 2016, Science China Information Sciences.

[15]  Zhang Yi,et al.  Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering , 2012, IEEE Transactions on Cybernetics.

[16]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[17]  Wu-Jun Li,et al.  Double-Bit Quantization for Hashing , 2012, AAAI.

[18]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[19]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[21]  Nicu Sebe,et al.  Supervised Hashing with Pseudo Labels for Scalable Multimedia Retrieval , 2015, ACM Multimedia.

[22]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Zi Huang,et al.  Sparse hashing for fast multimedia search , 2013, TOIS.

[24]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[26]  Alexandr Andoni,et al.  Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors , 2016, SODA.

[27]  Yu-Bin Yang,et al.  Hashing With Pairwise Correlation Learning and Reconstruction , 2017, IEEE Transactions on Multimedia.

[28]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Patrick Pérez,et al.  Approximate Search with Quantized Sparse Representations , 2016, ECCV.

[30]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[31]  Jiwen Lu,et al.  Automatic Subspace Learning via Principal Coefficients Embedding , 2014, IEEE Transactions on Cybernetics.

[32]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[33]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[34]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[35]  Fei Wang,et al.  Composite hashing with multiple information sources , 2011, SIGIR.

[36]  Wei Liu,et al.  Learning to Hash for Indexing Big Data—A Survey , 2015, Proceedings of the IEEE.

[37]  Qiang Yang,et al.  Heterogeneous Translated Hashing , 2016, ACM Trans. Knowl. Discov. Data.

[38]  Dinggang Shen,et al.  Subspace Regularized Sparse Multitask Learning for Multiclass Neurodegenerative Disease Identification , 2016, IEEE Transactions on Biomedical Engineering.

[39]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[40]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[41]  Ge Yu,et al.  A novel cross-modal hashing algorithm based on multimodal deep learning , 2015, Science China Information Sciences.

[42]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[43]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[44]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Meng Wang,et al.  Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval , 2017, IEEE Transactions on Multimedia.

[46]  Zhang Yi,et al.  A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[48]  Heng Tao Shen,et al.  Hashing on Nonlinear Manifolds , 2014, IEEE Transactions on Image Processing.

[49]  WangJun,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012 .

[50]  Xuelong Li,et al.  Latent Semantic Minimal Hashing for Image Retrieval , 2017, IEEE Transactions on Image Processing.

[51]  Xuelong Li,et al.  Spectral Multimodal Hashing and Its Application to Multimedia Retrieval , 2016, IEEE Transactions on Cybernetics.

[52]  Xiaofeng Zhu,et al.  Video-to-Shot Tag Propagation by Graph Sparse Group Lasso , 2013, IEEE Transactions on Multimedia.

[53]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[55]  Jiwen Lu,et al.  Nonlinear Discrete Hashing , 2017, IEEE Transactions on Multimedia.

[56]  Shih-Fu Chang,et al.  Spherical Hashing: Binary Code Embedding with Hyperspheres , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Ngai-Man Cheung,et al.  Learning to Hash with Binary Deep Neural Network , 2016, ECCV.

[58]  Xuelong Li,et al.  Compact Structure Hashing via Sparse and Similarity Preserving Embedding , 2016, IEEE Transactions on Cybernetics.

[59]  Xinlei Chen,et al.  Large Scale Spectral Clustering Via Landmark-Based Sparse Representation , 2015, IEEE Transactions on Cybernetics.

[60]  Nassir Navab,et al.  Metric hashing forests , 2016, Medical Image Anal..

[61]  Chun Chen,et al.  Semi-Supervised Nonlinear Hashing Using Bootstrap Sequential Projection Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[62]  Wu-Jun Li,et al.  Feature Learning Based Deep Supervised Hashing with Pairwise Labels , 2015, IJCAI.

[63]  Xuelong Li,et al.  Large Graph Hashing with Spectral Rotation , 2017, AAAI.

[64]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[65]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Shichao Zhang,et al.  Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[67]  Xiaofeng Zhu,et al.  Graph self-representation method for unsupervised feature selection , 2017, Neurocomputing.

[68]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).