Deep learning model with low-dimensional random projection for large-scale image search

Abstract Developing deep learning models that can scale to large image repositories is increasingly gaining significant efforts in the domain of image search. The current deep neural networks rely on the computational power of accelerators (e.g. GPUs) to tackle the processing limitations associated with features extraction and model training. This paper introduces and investigates a deep model of Convolutional Neural Networks (CNNs) to efficiently extract, index, and retrieve images in the context of large-scale Content-Based Image Retrieval (CBIR). Random Maclaurin projection is used to generate low-dimensional image descriptors and their discriminating efficiency is evaluated on standard image datasets. The scalability of deep architectures is also evaluated on one million image dataset over a High-Performance Computing (HPC) platform, which is assessed in terms of the retrieval accuracy, speed of features extraction and memory costs. Additionally, the controlling GPU kernels of the proposed model are examined under several optimization factors to evaluate their impact on the processing and retrieval performance. The experimental results show the effectiveness of the proposed model in the retrieval accuracy, GPU utilisation, speed of features extraction, and storage of image indexing.

[1]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[2]  Anu Bala,et al.  Local texton XOR patterns: A new feature descriptor for content-based image retrieval , 2016 .

[3]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[4]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[5]  Ling-Yu Duan,et al.  Codebook-Free Compact Descriptor for Scalable Visual Search , 2019, IEEE Transactions on Multimedia.

[6]  Kyandoghere Kyamakya,et al.  CNN based high performance computing for real time image processing on GPU , 2011 .

[7]  Jürgen Schmidhuber,et al.  Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[8]  Xuelong Li,et al.  Graph PCA Hashing for Similarity Search , 2017, IEEE Transactions on Multimedia.

[9]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[12]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[14]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[15]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[17]  Shiliang Zhang,et al.  Semantic-Aware Co-indexing for Image Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Wen Gao,et al.  Minimizing Reconstruction Bias Hashing via Joint Projection Learning and Quantization , 2018, IEEE Transactions on Image Processing.

[20]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[22]  Qi Tian,et al.  Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  V. K. Govindan,et al.  Retrieval of pathological retina images using Bag of Visual Words and pLSA model , 2019, Engineering Science and Technology, an International Journal.

[26]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[27]  Sotirios G. Ziavras,et al.  HERA: A RECONFIGURABLE AND MIXED-MODE PARALLEL COMPUTING ENGINE ON PLATFORM FPGAS* , 2004 .

[28]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Hongtao Lu,et al.  Generalized Residual Vector Quantization and Aggregating Tree for Large Scale Search , 2017, IEEE Transactions on Multimedia.

[30]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  N. C. Chauhan,et al.  Deep neural network model for group activity recognition using contextual relationship , 2019, Engineering Science and Technology, an International Journal.

[34]  James Greco PARALLEL IMAGE PROCESSING AND COMPUTER VISION ARCHITECTURE , 2005 .

[35]  Wen Gao,et al.  Compact Deep Invariant Descriptors for Video Retrieval , 2017, 2017 Data Compression Conference (DCC).

[36]  Qi Tian,et al.  Scalable Object Retrieval with Compact Image Representation from Generic Object Regions , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[37]  Jianmin Wang,et al.  Deep Visual-Semantic Quantization for Efficient Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Chu-Song Chen,et al.  Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.