A Novel Structure of Convolutional Layers with a Higher Performance-Complexity Ratio for Semantic Segmentation

In this paper, we study an important factor that determines the capacity of a CNN model and propose a novel structure of convolutional layers with a higher performance-complexity ratio. Firstly, the relationship of the model capacity and the number of parameters versus segmentation performance is explored. Secondly, a mechanism is proposed to optimize the structure of a CNN model for a specific task. The mechanism also provides better convergence than current state-of-the-art methods for factorizing convolutional layers, such as MobileNet. Thirdly, we propose a measure based on the mutual information between hidden activations and inputs/outputs to compute the capacity of a CNN model. This measure is highly correlated with segmentation performance. Experimental results on the segmentation of the PASCAL Person Parts Dataset show that the linear dependency among convolutional kernels is an important factor determining the capacity of a CNN model. It is also demonstrated that our approach can successfully adjust the model capacity to best match to the complexity of a dataset. The optimized CNN model achieves the similar performance to Deeplab-V2 on the segmentation task with 100 × less parameters, resulting in a significantly improved performance-complexity ratio.

[1]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Shuicheng Yan,et al.  Semantic Object Parsing with Local-Global Long Short-Term Memory , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[4]  H. Abdi,et al.  Principal component analysis , 2010 .

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hassan Foroosh,et al.  Factorized Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[7]  Sanja Fidler,et al.  Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Zheru Chi,et al.  A Fully-Convolutional Framework for Semantic Segmentation , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[9]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[10]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[11]  Alan L. Yuille,et al.  Zoom Better to See Clearer: Human Part Segmentation with Auto Zoom Net , 2015, ArXiv.

[12]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[13]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[16]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[17]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wolfram Burgard,et al.  Deep learning for human part discovery in images , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Shuicheng Yan,et al.  Semantic Object Parsing with Graph LSTM , 2016, ECCV.

[21]  Yan Wang,et al.  SORT: Second-Order Response Transform for Visual Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[23]  Takio Kurita,et al.  Fast and Accurate Image Super Resolution by Deep CNN with Skip Connection and Network in Network , 2017, ICONIP.

[24]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[25]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[26]  Martial Hebert,et al.  Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Alessandro Rozza,et al.  Automated Pruning for Deep Neural Network Compression , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[29]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.