Structural adaptation for sparsely connected MLP using Newton's method

In this work, we propose a paradigm for constructing a sparsely-connected multi-layer perceptron (MLP). Using Orthogonal Least Squares (OLS) method for training, the proposed method prunes the hidden units and output weights based on their usefulness to design a sparsely connected MLP. We formulate second order algorithm to obtain the closed-form expression for hidden unit learning factors thereby minimizing hand-tuned parameters. The usefulness of the proposed algorithm is further substantiated by its ability to differentiate two combined datasets. Using widely available datasets, the proposed algorithm's 10-fold testing error is shown to be less than that of several other algorithms. Inducing sparsity into a fully-connected neural network, pruning of the hidden units, Newton's method for optimization, and orthogonal least squares are the subject matter of the present work.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Günther Palm,et al.  Sparse activity and sparse connectivity in supervised learning , 2016, J. Mach. Learn. Res..

[3]  Hynek Hermansky,et al.  Multilayer perceptron with sparse hidden outputs for phoneme recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Michael Elad,et al.  Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation , 2010, IEEE Transactions on Signal Processing.

[5]  Vipin Kumar,et al.  Highly Scalable Parallel Algorithms for Sparse Matrix Factorization , 1997, IEEE Trans. Parallel Distributed Syst..

[6]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[7]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[8]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[9]  Michael T. Manry,et al.  Minimizing validation error with respect to network size and number of training epochs , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[10]  Yuan Xu,et al.  Sparse Pseudo Inverse of the Discrete Plane Wave Transform , 2008, IEEE Transactions on Antennas and Propagation.

[11]  Michael T. Manry,et al.  Partially affine invariant back propagation , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[12]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[13]  Michael T. Manry,et al.  An efficient hidden layer training method for the multilayer perceptron , 2006, Neurocomputing.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Michael T. Manry,et al.  Multiple optimal learning factors for feed-forward networks , 2010, Defense + Commercial Sensing.

[16]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[17]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[18]  Yonina C. Eldar,et al.  Reduce and Boost: Recovering Arbitrary Sets of Jointly Sparse Vectors , 2008, IEEE Transactions on Signal Processing.

[19]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[20]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Federico Girosi,et al.  An Equivalence Between Sparse Approximation and Support Vector Machines , 1998, Neural Computation.

[22]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[23]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[24]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.