Improving CNN Performance Accuracies With Min–Max Objective

We propose a novel method for improving performance accuracies of convolutional neural network (CNN) without the need to increase the network complexity. We accomplish the goal by applying the proposed Min–Max objective to a layer below the output layer of a CNN model in the course of training. The Min–Max objective explicitly ensures that the feature maps learned by a CNN model have the minimum within-manifold distance for each object manifold and the maximum between-manifold distances among different object manifolds. The Min–Max objective is general and able to be applied to different CNNs with insignificant increases in computation cost. Moreover, an incremental minibatch training procedure is also proposed in conjunction with the Min–Max objective to enable the handling of large-scale training data. Comprehensive experimental evaluations on several benchmark data sets with both the image classification and face verification tasks reveal that employing the proposed Min–Max objective in the training process can remarkably improve performance accuracies of a CNN model in comparison with the same model trained without using this objective.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  Nitish Srivastava,et al.  Discriminative Transfer Learning with Tree-based Priors , 2013, NIPS.

[6]  Fei Liu,et al.  Method for Determining the Optimal Number of Clusters Based on Agglomerative Hierarchical Clustering , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Guo-Sen Xie,et al.  Integrating supervised subspace criteria with restricted Boltzmann Machine for feature extraction , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[8]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[9]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[10]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[13]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[15]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[17]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[21]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Masashi Sugiyama,et al.  Local Fisher discriminant analysis for supervised dimensionality reduction , 2006, ICML.

[23]  Martin A. Riedmiller,et al.  Improving Deep Neural Networks with Probabilistic Maxout Units , 2013, ICLR.

[24]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[25]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[26]  Mario Fritz,et al.  Learnable Pooling Regions for Image Classification , 2013, ICLR.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[31]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[34]  DiCarlo James,et al.  Human versus machine: comparing visual object recognition systems on a level playing field. , 2010 .

[35]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[37]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[40]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[43]  Yi Li,et al.  Convolutional Neural Networks for No-Reference Image Quality Assessment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[45]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[46]  Dumitru Erhan,et al.  Scalable, High-Quality Object Detection , 2014, ArXiv.

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[49]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[50]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[51]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[52]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[53]  Wai Keung Wong,et al.  Deep Learning Regularized Fisher Mappings , 2011, IEEE Transactions on Neural Networks.

[54]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.