The Curious Case of Convex Neural Networks.

In this paper, we investigate a constrained formulation of neural networks where the output is a convex function of the input. We show that the convexity constraints can be enforced on both fully connected and convolutional layers, making them applicable to most architectures. The convexity constraints include restricting the weights (for all but the first layer) to be non-negative and using a non-decreasing convex activation function. Albeit simple, these constraints have profound implications on the generalization abilities of the network. We draw three valuable insights: (a) Input Output Convex Neural Networks (IOC-NNs) self regularize and reduce the problem of overfitting; (b) Although heavily constrained, they outperform the base multi layer perceptrons and achieve similar performance as compared to base convolutional architectures and (c) IOC-NNs show robustness to noise in train labels. We demonstrate the efficacy of the proposed idea using thorough experiments and ablation studies on standard image classification datasets with three different neural network architectures.

[1]  Philip H.S. Torr,et al.  Calibrating Deep Neural Networks using Focal Loss , 2020, NeurIPS.

[2]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[3]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[4]  B. Pataki,et al.  Lower Bounds on the Vapnik-Chervonenkis Dimension of Convex Polytope Classifiers , 2007, 2007 11th International Conference on Intelligent Engineering Systems.

[5]  Ling Huang,et al.  Large-Margin Convex Polytope Machine , 2014, NIPS.

[6]  Baosen Zhang,et al.  Input Convex Neural Networks for Optimal Voltage Regulation , 2020, 2002.08684.

[7]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[8]  T. Poggio,et al.  Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.

[9]  Joachim Denzler,et al.  Do We Train on Test Data? Purging CIFAR of Near-Duplicates , 2019, J. Imaging.

[10]  Amnon Shashua,et al.  Convolutional Rectifier Networks as Generalized Tensor Decompositions , 2016, ICML.

[11]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[12]  D. Melzer,et al.  On the expressibility of piecewise-linear continuous functions as the difference of two piecewise-linear convex functions , 1986 .

[13]  Eric Mazumdar,et al.  Input-Convex Neural Networks and Posynomial Optimization , 2016 .

[14]  Naresh Manwani,et al.  Learning Polyhedral Classifiers Using Logistic Function , 2010, ACML.

[15]  Wenxin Jiang The VC Dimension for Mixtures of Binary Classifiers , 2000, Neural Computation.

[16]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[17]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[18]  Samy Bengio,et al.  Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.

[19]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[20]  Robert A. Legenstein,et al.  On the Classification Capability of Sign-Constrained Perceptrons , 2008, Neural Computation.

[21]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[22]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[23]  L. Ljung,et al.  Overtraining, regularization and searching for a minimum, with application to neural networks , 1995 .

[24]  Tegan Maharaj,et al.  Deep Nets Don't Learn via Memorization , 2017, ICLR.

[25]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[26]  Ryota Tomioka,et al.  In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.

[27]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[28]  J. Zico Kolter,et al.  Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.

[29]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[30]  Yuanyuan Shi,et al.  Optimal Control Via Neural Networks: A Convex Approach , 2018, ICLR.

[31]  Anita Kripfganz,et al.  Piecewise affine functions as a difference of two convex functions , 1987 .

[32]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[33]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[35]  Stephen P. Boyd,et al.  Convex piecewise-linear fitting , 2009 .

[36]  Naresh Manwani,et al.  PLUME: Polyhedral Learning Using Mixture of Experts , 2019, ArXiv.