SelfieBoost: A Boosting Algorithm for Deep Learning

We describe and analyze a new boosting algorithm for deep learning called SelfieBoost. Unlike other boosting algorithms, like AdaBoost, which construct ensembles of classifiers, SelfieBoost boosts the accuracy of a single network. We prove a $\log(1/\epsilon)$ convergence rate for SelfieBoost under some "SGD success" assumption which seems to hold in practice.

[1]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[2]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[5]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[6]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[7]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[8]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[9]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[10]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[11]  Yoshua Bengio,et al.  Boosting Neural Networks , 2000, Neural Computation.

[12]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[13]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[14]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[15]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[16]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[17]  Yonatan Wexler,et al.  Minimizing the Maximal Loss: How and Why , 2016, ICML.

[18]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[20]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[26]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..