Improving the generalization performance of deep networks by dual pattern learning with adversarial adaptation

Abstract In this paper, we present a dual pattern learning network architecture with adversarial adaptation (DPLAANet). Unlike conventional networks, the proposed network has two input branches and two loss functions. This architecture forces the network to learn robust features by analysing dual inputs. The dual input structure allows the network to have a considerably large number of image pairs, which can help address the overfitting issue due to limited training data. In addition, we propose to associate the two input branches with two random interest values during training. As a stochastic regularization technique, this method can improve the generalization performance. Moreover, we introduce to use the adversarial training approach to reduce the domain difference between fused image features and single image features. Extensive experiments on CIFAR-10, CIFAR-100, FI-8, the Google commands dataset, and MNIST demonstrate that our DPLAANets exhibit better performance than the baseline networks. The experimental results on subsets of CIFAR-10, CIFAR-100, and MNIST demonstrate that DPLAANets have a good generalization performance on small datasets. The proposed architecture can be easily extended to have more than two input branches. The experimental results on subsets of MNIST show that the architecture with three branches outperforms two branches when the training set is extremely small.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[3]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[4]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[6]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  Alan L. Yuille,et al.  Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[10]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[11]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[12]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xiao Yu,et al.  Blurred trace infrared image segmentation based on template approach and immune factor , 2014 .

[14]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Lei Wang,et al.  Multiple Kernel k-Means with Incomplete Kernels , 2017, AAAI.

[16]  Tatsuya Harada,et al.  Between-Class Learning for Image Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[20]  Yong Jae Lee,et al.  Cross-Domain Self-Supervised Multi-task Feature Learning Using Synthetic Imagery , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Haimin Zhang,et al.  Recognition of Emotions in User-Generated Videos With Kernelized Features , 2018, IEEE Transactions on Multimedia.

[23]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[24]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[25]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[26]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[28]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[29]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[30]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.