Compact bilinear pooling via kernelized random projection for fine-grained image categorization on low computational power devices

Abstract Bilinear pooling is one of the most popular and effective methods for fine-grained image recognition. However, a major drawback of Bilinear pooling is the dimensionality of the resulting descriptors, which typically consist of several hundred thousand features. Even when generating the descriptor is tractable, its dimension makes any subsequent operations impractical and often results in huge computational and storage costs. We introduce a novel method to efficiently reduce the dimension of bilinear pooling descriptors by performing a Random Projection. Conveniently, this is achieved without ever computing the high-dimensional descriptor explicitly. Our experimental results show that our method outperforms existing compact bilinear pooling algorithms in most cases, while running faster on low computational power devices, where efficient extensions of bilinear pooling are most useful.

[1]  Abbes Amira,et al.  Content-based image retrieval with compact deep convolutional features , 2017, Neurocomputing.

[2]  Ya Zhang,et al.  Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[4]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[5]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[6]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[7]  Brian C. Lovell,et al.  Efficient clustering on Riemannian manifolds: A kernelised random projection approach , 2015, Pattern Recognit..

[8]  Rasmus Pagh,et al.  Fast and scalable polynomial kernels via explicit feature maps , 2013, KDD.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Juan M. Corchado,et al.  Data-independent Random Projections from the feature-space of the homogeneous polynomial kernel , 2018, Pattern Recognit..

[11]  Hao Zhou,et al.  Faster R-CNN for marine organisms detection and recognition using data augmentation , 2019, Neurocomputing.

[12]  Subhransu Maji,et al.  One-to-many face recognition with bilinear CNNs , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Mohsen Guizani,et al.  Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications , 2015, IEEE Communications Surveys & Tutorials.

[15]  Qilong Wang,et al.  Hyperlayer Bilinear Pooling with application to fine-grained categorization and image retrieval , 2017, Neurocomputing.

[16]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[17]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[18]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[19]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Zhou Yu,et al.  Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Victor S. Lempitsky,et al.  Multi-Region bilinear convolutional neural networks for person re-identification , 2015, 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[23]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[24]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[25]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[26]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Önsen Toygar,et al.  On the use of DAG-CNN architecture for age estimation with multi-stage features fusion , 2019, Neurocomputing.

[28]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[29]  Ming Zhang,et al.  Overfitting remedy by sparsifying regularization on fully-connected layers of CNNs , 2019, Neurocomputing.

[30]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[31]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[32]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[33]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[34]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[35]  Juan M. Corchado,et al.  Data-independent Random Projections from the feature-map of the homogeneous polynomial kernel of degree two , 2018, Inf. Sci..

[36]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[37]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..