Convex Neural Networks

Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multi-layer artificial neural networks. We show that training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors.

[1]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[3]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[4]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[5]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.

[6]  Patrice Marcotte,et al.  Novel approaches to the discrimination problem , 1992, ZOR Methods Model. Oper. Res..

[7]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[8]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Marco Muselli,et al.  On convergence properties of pocket algorithm , 1997, IEEE Trans. Neural Networks.

[12]  Yves Grandvalet Least Absolute Shrinkage is Equivalent to Quadratic Penalization , 1998 .

[13]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[16]  Gunnar Rätsch,et al.  Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces , 2002, Machine Learning.

[17]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[18]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.