Learning Sparse Deep Feedforward Networks via Tree Skeleton Expansion

Despite the popularity of deep learning, structure learning for deep models remains a relatively under-explored area. In contrast, structure learning has been studied extensively for probabilistic graphical models (PGMs). In particular, an efficient algorithm has been developed for learning a class of tree-structured PGMs called hierarchical latent tree models (HLTMs), where there is a layer of observed variables at the bottom and multiple layers of latent variables on top. In this paper, we propose a simple method for learning the structures of feedforward neural networks (FNNs) based on HLTMs. The idea is to expand the connections in the tree skeletons from HLTMs and to use the resulting structures for FNNs. An important characteristic of FNN structures learned this way is that they are sparse. We present extensive empirical results to show that, compared with standard FNNs tuned-manually, sparse FNNs learned by our method achieve better or comparable classification performance with much fewer parameters. They are also more interpretable.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[3]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[6]  Xiaogang Wang,et al.  Structure Learning for Deep Neural Networks Based on Multiobjective Optimization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  T. Ash,et al.  Dynamic node creation in backpropagation networks , 1989, International 1989 Joint Conference on Neural Networks.

[8]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[9]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[13]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[14]  Farhan Khawar,et al.  Latent tree models for hierarchical topic detection , 2016, Artif. Intell..

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Leonard K. M. Poon,et al.  Progressive EM for Latent Tree Models and Hierarchical Topic Detection , 2015, AAAI.

[17]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[18]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[19]  S. Hochreiter,et al.  DeepTox: Toxicity prediction using deep learning , 2017 .

[20]  Dit-Yan Yeung,et al.  Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis , 2017, AAAI.

[21]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[22]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Ryan P. Adams,et al.  Learning the Structure of Deep Sparse Graphical Models , 2009, AISTATS.

[27]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[28]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[29]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[30]  Martin G. Bello,et al.  Enhanced training algorithms, and integrated training/architecture selection for multilayer perceptron networks , 1992, IEEE Trans. Neural Networks.

[31]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[32]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[33]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[34]  William IEEE TRANSACTIONS ON INFORMATION THEORY VOL XX NO Y MONTH Signal Propagation and Noisy Circuits , 2019 .

[35]  Simon King,et al.  IEEE Workshop on automatic speech recognition and understanding , 2009 .

[36]  James T. Kwok,et al.  Constructive algorithms for structure learning in feedforward neural networks for regression problems , 1997, IEEE Trans. Neural Networks.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[40]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[41]  Tengfei Liu,et al.  Hierarchical Latent Tree Analysis for Topic Detection , 2014, ECML/PKDD.

[42]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.