The Curious Case of Convex Networks

In this paper, we investigate a constrained formulation of neural networks where the output is a convex function of the input. We show that the convexity constraints can be enforced on both fully connected and convolutional layers, making them applicable to most architectures. The convexity constraints include restricting the weights (for all but the first layer) to be non-negative and using a non-decreasing convex activation function. Albeit simple, these constraints have profound implications on the generalization abilities of the network. We draw three valuable insights: (a) Input Output Convex Networks (IOC-NN) self regularize and almost uproot the problem of overfitting; (b) Although heavily constrained, they come close to the performance of the base architectures; and (c) The ensemble of convex networks can match or outperform the non convex counterparts. We demonstrate the efficacy of the proposed idea using thorough experiments and ablation studies on MNIST, CIFAR10, and CIFAR100 datasets with three different neural network architectures. The code for this project is publicly available at: \url{this https URL}.

[1]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[2]  Ruslan Salakhutdinov,et al.  Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[3]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[4]  Stephen P. Boyd,et al.  Convex piecewise-linear fitting , 2009 .

[5]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Baosen Zhang,et al.  Input Convex Neural Networks for Optimal Voltage Regulation , 2020, 2002.08684.

[8]  Yuanyuan Shi,et al.  Optimal Control Via Neural Networks: A Convex Approach , 2018, ICLR.

[9]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[10]  T. Poggio,et al.  Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.

[11]  Eric Mazumdar,et al.  Input-Convex Neural Networks and Posynomial Optimization , 2016 .

[12]  Wenxin Jiang The VC Dimension for Mixtures of Binary Classifiers , 2000, Neural Computation.

[13]  Naresh Manwani,et al.  PLUME: Polyhedral Learning Using Mixture of Experts , 2019, ArXiv.

[14]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[17]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[18]  Amnon Shashua,et al.  Convolutional Rectifier Networks as Generalized Tensor Decompositions , 2016, ICML.

[19]  B. Pataki,et al.  Lower Bounds on the Vapnik-Chervonenkis Dimension of Convex Polytope Classifiers , 2007, 2007 11th International Conference on Intelligent Engineering Systems.

[20]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[21]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[22]  L. Ljung,et al.  Overtraining, regularization and searching for a minimum, with application to neural networks , 1995 .

[23]  Tegan Maharaj,et al.  Deep Nets Don't Learn via Memorization , 2017, ICLR.

[24]  Ling Huang,et al.  Large-Margin Convex Polytope Machine , 2014, NIPS.

[25]  Naresh Manwani,et al.  Learning Polyhedral Classifiers Using Logistic Function , 2010, ACML.

[26]  Robert A. Legenstein,et al.  On the Classification Capability of Sign-Constrained Perceptrons , 2008, Neural Computation.