Building Sparse Deep Feedforward Networks using Tree Receptive Fields

Sparse connectivity is an important factor behind the success of convolutional neural networks and recurrent neural networks. In this paper, we consider the problem of learning sparse connectivity for feedforward neural networks (FNNs). The key idea is that a unit should be connected to a small number of units at the next level below that are strongly correlated. We use Chow-Liu's algorithm to learn a tree-structured probabilistic model for the units at the current level, use the tree to identify subsets of units that are strongly correlated, and introduce a new unit with receptive field over the subsets. The procedure is repeated on the new units to build multiple layers of hidden units. The resulting model is called a TRF-net. Empirical results show that, when compared to dense FNNs, TRF-net achieves better or comparable classification performance with much fewer parameters and sparser structures. They are also more interpretable.

[1]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[2]  Pushmeet Kohli,et al.  Memory Bounded Deep Convolutional Networks , 2014, ArXiv.

[3]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[4]  Ryan P. Adams,et al.  Learning the Structure of Deep Sparse Graphical Models , 2009, AISTATS.

[5]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[6]  Martin G. Bello,et al.  Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptron networks , 1992, IEEE Trans. Neural Networks.

[7]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[8]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[9]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[10]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[11]  James T. Kwok,et al.  Constructive algorithms for structure learning in feedforward neural networks for regression problems , 1997, IEEE Trans. Neural Networks.

[12]  Dit-Yan Yeung,et al.  Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis , 2017, AAAI.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[16]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[17]  Xiaogang Wang,et al.  Structure Learning for Deep Neural Networks Based on Multiobjective Optimization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[18]  T. Ash,et al.  Dynamic node creation in backpropagation networks , 1989, International 1989 Joint Conference on Neural Networks.

[19]  Pascal Poupart,et al.  Online Structure Learning for Sum-Product Networks with Gaussian Leaves , 2017, ICLR.

[20]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[21]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[22]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[25]  Charles Stark,et al.  Enhanced Training Algorithms, and Integrated Training/Architecture Selection for Multilayer Perceptron Networks , 1992 .

[26]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[27]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[28]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[29]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[30]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[31]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[32]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[33]  James T. Kwok,et al.  Objective functions for training new hidden units in constructive neural networks , 1997, IEEE Trans. Neural Networks.