Maxout Networks

We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique. We empirically verify that the model successfully accomplishes both of these tasks. We use maxout and dropout to demonstrate state of the art classification performance on four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN.

[1]  L. Abbott,et al.  A model of multiplicative neural responses in parietal cortex. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Richard H. R. Hahnloser,et al.  On the piecewise analysis of networks of linear threshold neurons , 1998, Neural Networks.

[4]  Shuning Wang,et al.  General constructive representations for continuous piecewise-linear functions , 2004, IEEE Trans. Circuits Syst. I Regul. Pap..

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[7]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[10]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[11]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[12]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[13]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[14]  Dong Yu,et al.  Deep Convex Net: A Scalable Architecture for Speech Pattern Classification , 2011, INTERSPEECH.

[15]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[16]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[17]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[18]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[21]  Nitish Srivastava,et al.  Improving Neural Networks with Dropout , 2013 .

[22]  Yoshua Bengio,et al.  Joint Training Deep Boltzmann Machines for Classification , 2013, ICLR.

[23]  Mario Fritz,et al.  Learnable Pooling Regions for Image Classification , 2013, ICLR.