Neural architecture search for sparse DenseNets with dynamic compression

Neural Architecture Search (NAS) algorithms have discovered highly novel state-of-the-art Convolutional Neural Networks (CNNs) for image classification, and are beginning to improve our understanding of CNN architectures. However, within NAS research, there are limited studies focussing on the role of skip-connections, and how the configurations of connections between layers can be optimised to improve CNN performance. Our work focusses on developing a new evolutionary NAS algorithm based on adjacency matrices to optimise skip-connection structures, creating more specialised and powerful skip-connection structures within a DenseNet-BC network than previously seen in the literature. Our work further demonstrates how simple adjacency matrices can be interpreted in a way which allows for a more dynamic variant of DenseNet-BC. The final algorithm, using this novel interpretation of adjacency matrices for architecture design and evolved on the CIFAR100 dataset, finds networks with improved performance relative to a baseline DenseNet-BC network on both the CIFAR10 and CIFAR100 datasets, being the first, to our knowledge, NAS algorithm for skip-connection optimisation to do so. Finally, skip-connection structures discovered by our algorithm are analysed, and some important skip-connection patterns are highlighted.

[1]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[2]  Alan L. Yuille,et al.  Genetic CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[4]  Shota Saito,et al.  Controlling Model Complexity in Probabilistic Model-Based Dynamic Optimization of Neural Network Structures , 2019, ICANN.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[8]  Gustavo Carneiro,et al.  On the importance of normalisation layers in deep learning with piecewise linear activation units , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[13]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[14]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[15]  Bing Xue,et al.  The Evolution of Adjacency Matrices for Sparsity of Connection in DenseNets , 2019, 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Mengjie Zhang,et al.  Completely Automated CNN Architecture Design Based on Blocks , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[19]  Arjun Sondhi,et al.  The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks , 2018, J. Mach. Learn. Res..

[20]  Martin Wistuba,et al.  A Survey on Neural Architecture Search , 2019, ArXiv.

[21]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[22]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[24]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[26]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.