A Latent Variable Augmentation Method for Image Categorization with Insufficient Training Samples

Over the past few years, we have made great progress in image categorization based on convolutional neural networks (CNNs). These CNNs are always trained based on a large-scale image data set; however, people may only have limited training samples for training CNN in the real-world applications. To solve this problem, one intuition is augmenting training samples. In this article, we propose an algorithm called Lavagan (Latent Variables Augmentation Method based on Generative Adversarial Nets) to improve the performance of CNN with insufficient training samples. The proposed Lavagan method is mainly composed of two tasks. The first task is that we augment a number latent variables (LVs) from a set of adaptive and constrained LVs distributions. In the second task, we take the augmented LVs into the training procedure of the image classifier. By taking these two tasks into account, we propose a uniform objective function to incorporate the two tasks into the learning. We then put forward an alternative two-play minimization game to minimize this uniform loss function such that we can obtain the predictive classifier. Moreover, based on Hoeffding’s Inequality and Chernoff Bounding method, we analyze the feasibility and efficiency of the proposed Lavagan method, which manifests that the LV augmentation method is able to improve the performance of Lavagan with insufficient training samples. Finally, the experiment has shown that the proposed Lavagan method is able to deliver more accurate performance than the existing state-of-the-art methods.

[1]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[2]  Frank Nielsen,et al.  Boosting k-NN for Categorization of Natural Scenes , 2010, International Journal of Computer Vision.

[3]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Taesup Kim,et al.  Fast AutoAugment , 2019, NeurIPS.

[5]  Dong Liu,et al.  DADA: Deep Adversarial Data Augmentation for Extremely Low Data Regime Classification , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[9]  Yuan Dong,et al.  Multi-Hierarchical Independent Correlation Filters For Visual Tracking , 2018, 2020 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Sassan Saatchi,et al.  The use of decision tree and multiscale texture for classification of JERS-1 SAR data over tropical forest , 2000, IEEE Trans. Geosci. Remote. Sens..

[12]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jingxian Wu,et al.  Approximating a Sum of Random Variables with a Lognormal , 2007, IEEE Transactions on Wireless Communications.

[15]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[16]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[17]  Bineng Zhong,et al.  CNNTracker: Online discriminative object tracking via deep convolutional neural network , 2016, Appl. Soft Comput..

[18]  Hyun Seung Yang,et al.  SSPP-DAN: Deep domain adaptation network for face recognition with single sample per person , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[19]  Hwann-Tzong Chen,et al.  One-Shot Object Detection with Co-Attention and Co-Excitation , 2019, NeurIPS.

[20]  S. Nadarajah,et al.  Approximation methods for lognormal characteristic functions , 2018, Journal of Statistical Computation and Simulation.

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Haichao Zhu,et al.  A New Method to Assist Small Data Set Neural Network Learning , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[23]  Ryuei Nishii,et al.  Hyperspectral Image Classification by Bootstrap AdaBoost With Random Decision Stumps , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[24]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[25]  Juergen Gall,et al.  Open Set Domain Adaptation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Richa Singh,et al.  Learning Structure and Strength of CNN Filters for Small Sample Size Training , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[28]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[29]  Jie Geng,et al.  High-Resolution SAR Image Classification via Deep Convolutional Autoencoders , 2015, IEEE Geoscience and Remote Sensing Letters.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xi Peng,et al.  A Generative Adversarial Approach for Zero-Shot Learning from Noisy Texts , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[33]  S. C. Suddarth,et al.  Rule-Injection Hints as a Means of Improving Network Performance and Learning Time , 1990, EURASIP Workshop.

[34]  Josiane Zerubia,et al.  Bayesian image classification using Markov random fields , 1996, Image Vis. Comput..

[35]  Joakim Andén,et al.  Multiscale Scattering for Audio Classification , 2011, ISMIR.

[36]  Nikolaj Tatti,et al.  Distances between Data Sets Based on Summary Statistics , 2007, J. Mach. Learn. Res..

[37]  Trevor Darrell,et al.  Learning the Structure of Deep Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[39]  Enhong Chen,et al.  Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective , 2015, IJCAI.

[40]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Renjie Liao,et al.  Incremental Few-Shot Learning with Attention Attractor Networks , 2018, NeurIPS.

[42]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Yuanping Zhu,et al.  Calibrated Rank-SVM for multi-label image categorization , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[45]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[46]  Wei Xiong,et al.  Regularizing Deep Convolutional Neural Networks with a Structured Decorrelation Constraint , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[47]  M. Madheswaran,et al.  Hybrid Medical Image Classification Using Association Rule Mining with Decision Tree Algorithm , 2010, ArXiv.

[48]  Derek Greene,et al.  EVE: explainable vector based embedding technique using Wikipedia , 2017, Journal of Intelligent Information Systems.

[49]  Giorgio Terracina,et al.  Biomedical Data Augmentation Using Generative Adversarial Neural Networks , 2017, ICANN.

[50]  David G. Lowe,et al.  Local Naive Bayes Nearest Neighbor for image classification , 2011, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[52]  Sergey Zagoruyko,et al.  Scaling the Scattering Transform: Deep Hybrid Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53]  Bo Liu,et al.  COB method with online learning for object tracking , 2020, Neurocomputing.

[54]  Zengchang Qin,et al.  Emotion Classification with Data Augmentation Using Generative Adversarial Networks , 2018, PAKDD.

[55]  Yu Gong,et al.  A Minimax Game for Instance based Selective Transfer Learning , 2019, KDD.

[56]  Jinhui Tang,et al.  Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation , 2015, ACM Multimedia.

[57]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[58]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[59]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[61]  Heinz Handels,et al.  Multi‐resolution multi‐object statistical shape models based on the locality assumption , 2017, Medical Image Anal..

[62]  Thorsten Gerber,et al.  Handbook Of Mathematical Functions , 2016 .

[63]  R. Leipnik,et al.  On lognormal random variables: I-the characteristic function , 1991, The Journal of the Australian Mathematical Society. Series B. Applied Mathematics.

[64]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[65]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[66]  Wei Wu,et al.  Online Hyper-Parameter Learning for Auto-Augmentation Strategy , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[68]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[69]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.