论文信息 - Sparse one hidden layer MLPs

Sparse one hidden layer MLPs

We discuss how to build sparse one hidden layer MLP replac- ing the standard l2 weight decay penalty on all weights by an l1 penalty on the linear output weights. We will propose an iterative two step training procedure where the output weights are found using FISTA proximal op- timization algorithm to solve a Lasso-like problem and the hidden weights are computed by unconstrained minimization. As we shall discuss, the procedure has a complexity equivalent to that of standard MLP training, yields MLPs with similar performance and, as a by product, automatically selects the number of hidden units.

José R. Dorronsoro | David Díaz | Alberto Torres | Alberto Torres | David Díaz

[1] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[2] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[3] R. Tibshirani,et al. Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[4] William H. Press,et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[5] Yoshua Bengio,et al. Large-Scale Feature Learning With Spike-and-Slab Sparse Coding , 2012, ICML.

[6] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[7] Daniel Hernández-Lobato,et al. Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation , 2013, J. Mach. Learn. Res..

[8] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[9] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.