Sparse one hidden layer MLPs

We discuss how to build sparse one hidden layer MLP replac- ing the standard l2 weight decay penalty on all weights by an l1 penalty on the linear output weights. We will propose an iterative two step training procedure where the output weights are found using FISTA proximal op- timization algorithm to solve a Lasso-like problem and the hidden weights are computed by unconstrained minimization. As we shall discuss, the procedure has a complexity equivalent to that of standard MLP training, yields MLPs with similar performance and, as a by product, automatically selects the number of hidden units.