An adaptive control momentum method as an optimizer in the cloud

Abstract Many issues in the cloud can be transformed into optimization problems, where data is of high dimension and randomness. Thus, stochastic optimizing is a key to Autonomous Cloud. And one of the most significant discussions in this field is how to adapt the learning rate and convergent path dynamically. This paper proposes a gradient-based algorithm called Adacom, that is based on an adaptive control system and momentum. Critically inheriting the previous studies, a reference model is introduced to generate the update. The method reduces noise and decides on paths with less oscillation, while maintaining the accumulated learning rate. Due to system design properties, the method requires fewer hyper-parameters for tuning. We state the prospect of Adacom as a general optimizer in Autonomous Cloud, and explore the potential of Adacom for pervasive computing by the assumption of transition data. Then we demonstrate the convergence of Adacom theoretically. The evaluations over the simulated transition data prove the feasibility and superiority of Adacom with other gradient-based methods.

[1]  Ioannis Mitliagkas,et al.  YellowFin and the Art of Momentum Tuning , 2017, MLSys.

[2]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Baochun Li,et al.  Joint request mapping and response routing for geo-distributed cloud services , 2013, 2013 Proceedings IEEE INFOCOM.

[6]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[7]  Siu-Ming Yiu,et al.  Multi-key privacy-preserving deep learning in cloud computing , 2017, Future Gener. Comput. Syst..

[8]  Alistair P. Rendell,et al.  CompAdaGrad: A Compressed, Complementary, Computationally-Efficient Adaptive Gradient Method , 2016, ArXiv.

[9]  Yolanda Gil,et al.  Scientific workflows in data analysis: Bridging expertise across multiple domains , 2017, Future Gener. Comput. Syst..

[10]  Amin Jula,et al.  Cloud computing service composition: A systematic literature review , 2014, Expert Syst. Appl..

[11]  Eric Chung Deep Learning in the Enhanced Cloud , 2017, ISPD.

[12]  Kuochen Wang,et al.  Application-Aware Resource Allocation for SDN-based Cloud Datacenters , 2013, 2013 International Conference on Cloud Computing and Big Data.

[13]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[14]  David Simms Big Data, Unstructured Data, and the Cloud: Perspectives on Internal Controls , 2015 .

[15]  Robert E. Mahony,et al.  Convergence of the Iterates of Descent Methods for Analytic Cost Functions , 2005, SIAM J. Optim..

[16]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[17]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[20]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[21]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[22]  A. S. Prihatmanto,et al.  Cloud computing reference model: The modelling of service availability based on application profile and resource allocation , 2012, 2012 International Conference on Cloud Computing and Social Networking (ICCCSN).

[23]  Anne E. James,et al.  Graph Analysis of Fog Computing Systems for Industry 4.0 , 2017, 2017 IEEE 14th International Conference on e-Business Engineering (ICEBE).

[24]  Quan Z. Sheng,et al.  Probability Matrix of Request-Solution Mapping for Efficient Service Selection , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[25]  Jitendra Kumar,et al.  Long Short Term Memory Recurrent Neural Network (LSTM-RNN) Based Workload Forecasting Model For Cloud Datacenters , 2018 .

[26]  Jordan L. Boyd-Graber,et al.  Why ADAGRAD Fails for Online Topic Modeling , 2017, EMNLP.

[27]  Kuochen Wang,et al.  An SLA-aware load balancing scheme for cloud datacenters , 2014, The International Conference on Information Networking 2014 (ICOIN2014).

[28]  W. Wiegerinck,et al.  Stochastic dynamics of learning with momentum in neural networks , 1994 .

[29]  G. Preethi,et al.  Application of Deep Learning to Sentiment Analysis for recommender system on cloud , 2017, 2017 International Conference on Computer, Information and Telecommunication Systems (CITS).

[30]  B. Venkatalakshmi,et al.  Neural load prediction technique for power optimization in cloud management system , 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES.

[31]  Fathi H. Ghorbel,et al.  Robustness of adaptive control of robots , 1992, J. Intell. Robotic Syst..

[32]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[33]  Christer Åhlund,et al.  Machine Learning in Pervasive Computing , 2013 .

[34]  Sanming Zhou,et al.  Networking for Big Data: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[37]  Paul Rad,et al.  Deep learning control for complex and large scale cloud systems , 2017, Intell. Autom. Soft Comput..

[38]  Ali Dehghantanha,et al.  A deep Recurrent Neural Network based approach for Internet of Things malware threat hunting , 2018, Future Gener. Comput. Syst..

[39]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[40]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[41]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..