Optimizing the Capacity of a Convolutional Neural Network for Image Segmentation and Pattern Recognition

 Abstract—In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.

[1]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[3]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[4]  Zheru Chi,et al.  A Fully-Convolutional Framework for Semantic Segmentation , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[5]  R. Kitchener Piaget’s Theory of Cognitive Development , 1986 .

[6]  Chris Tar,et al.  A Growing Long-term Episodic & Semantic Memory , 2016, ArXiv.

[7]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ludovic Denoyer,et al.  Learning Time-Efficient Deep Architectures with Budgeted Super Networks , 2017, ArXiv.

[9]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[10]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[11]  Wolfram Burgard,et al.  Deep learning for human part discovery in images , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Shuicheng Yan,et al.  Semantic Object Parsing with Graph LSTM , 2016, ECCV.

[13]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Andrew Gordon Wilson,et al.  Bayesian GAN , 2017, NIPS.

[16]  C. Nelson,et al.  Handbook of Developmental Cognitive Neuroscience , 2001 .

[17]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[18]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Alan L. Yuille,et al.  Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[21]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[22]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[23]  Alan L. Yuille,et al.  Zoom Better to See Clearer: Human Part Segmentation with Auto Zoom Net , 2015, ArXiv.

[24]  Olivier Sigaud,et al.  Towards Deep Developmental Learning , 2016, IEEE Transactions on Cognitive and Developmental Systems.

[25]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Shuicheng Yan,et al.  Semantic Object Parsing with Local-Global Long Short-Term Memory , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[28]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[29]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[30]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .