Long-term temporal averaging for stochastic optimization of deep neural networks

Deep learning models are capable of successfully tackling several difficult tasks. However, training deep neural models is not always a straightforward task due to several well-known issues, such as the problems of vanishing and exploding gradients. Furthermore, the stochastic nature of most of the used optimization techniques inevitably leads to instabilities during the training process, even when state-of-the-art stochastic optimization techniques are used. In this work, we propose an advanced temporal averaging technique that is capable of stabilizing the convergence of stochastic optimization for neural network training. Six different datasets and evaluation setups are used to extensively evaluate the proposed method and demonstrate the performance benefits. The more stable convergence of the algorithm also reduces the risk of stopping the training process when a bad descent step was taken and the learning rate was not appropriately set.

[1]  Ding Yuxin,et al.  Malware detection based on deep learning algorithm , 2017, Neural Computing and Applications.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[5]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[6]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[7]  Hamid Hassanpour,et al.  A comparative performance analysis of different activation functions in LSTM networks for classification , 2017, Neural Computing and Applications.

[8]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[9]  Jean Dickinson Gibbons,et al.  Nonparametric Statistical Inference , 1972, International Encyclopedia of Statistical Science.

[10]  Anastasios Tefas,et al.  Discriminatively Trained Autoencoders for Fast and Accurate Face Recognition , 2017, EANN.

[11]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Anastasios Tefas,et al.  Self-supervised autoencoders for clustering and classification , 2018, Evolving Systems.

[15]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[16]  Seungmin Rho,et al.  Medical image semantic segmentation based on deep learning , 2017, Neural Computing and Applications.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  Arunkumar Rajendran,et al.  Multi-retinal disease classification by reduced deep learning features , 2017, Neural Computing and Applications.

[21]  Anastasios Tefas,et al.  Concept detection and face pose estimation using lightweight convolutional neural networks for steering drone video shooting , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[22]  Nahum Shimkin,et al.  Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.

[23]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[24]  J. Crowley,et al.  Estimating Face orientation from Robust Detection of Salient Facial Structures , 2004 .

[25]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[26]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[27]  Xin Wang,et al.  Automatic breast tumor detection in ABVS images based on convolutional neural network and superpixel patterns , 2017, Neural Computing and Applications.

[28]  Anastasios Tefas,et al.  Efficient Camera Control using 2D Visual Information for Unmanned Aerial Vehicle-based Cinematography , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[29]  Nikolai Smolyanskiy,et al.  Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[34]  Ioannis Pitas,et al.  Challenges in Autonomous UAV Cinematography: An Overview , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[35]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[36]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[37]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[38]  Anastasios Tefas,et al.  Improving Face Pose Estimation Using Long-Term Temporal Averaging for Stochastic Optimization , 2017, EANN.

[39]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[40]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[42]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[43]  Aleksey Boyko,et al.  Detecting Cancer Metastases on Gigapixel Pathology Images , 2017, ArXiv.

[44]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.