Performance Analysis of Optimization Algorithms on Stacked Autoencoder

Stacked autoencoder (SAE) model, which is one of the deep learning methods, has been widely used in one dimensional data sets in recent years. In this study, a comparative performance analysis was performed using the five most commonly used optimization techniques and two well-known activation functions in SAE architecture. Stochastic Gradient Descent (SGD), Root Mean Square Propagation (RmsProp), Adaptive Moment Estimation (Adam), Adaptive Delta (Adadelta) and Nesterov-accelerated Adaptive Moment Estimation (Nadam) and Softmax and Sigmoid were used as optimization techniques. In this study, two different data sets in public UCI database were used. In order to verify the performance of the SAE model, experimental studies were performed by using the obtained data sets together with optimization and activation techniques separately. As a result of the experimental studies, the success rate of 88.89%, 85.19% in Cryotherapy and Immunotherapy data set was achieved by using Softmax activation function with SGD optimization method on three-layer SAE. After a successful training phase, adaptive optimization techniques Adam, Adadelta, Nadam and RmsProp methods were observed to have a weaker learning process than the stochastic method SGD.

[1]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[2]  Saeid Nahavandi,et al.  An expert system for selecting wart treatment method , 2017, Comput. Biol. Medicine.

[3]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[4]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[5]  Tom Schaul,et al.  Unit Tests for Stochastic Optimization , 2013, ICLR.

[6]  Lei Chen,et al.  Proteomics Analysis of FLT3-ITD Mutation in Acute Myeloid Leukemia Using Deep Learning Neural Network , 2017, Annals of clinical and laboratory science.

[7]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[8]  Onur Cömert,et al.  Classification and diagnosis of cervical cancer with stacked autoencoder and softmax classification , 2019, Expert Systems with Applications.

[9]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[10]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[11]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .