暂无分享,去创建一个
Keli Xiao | Wei Zhu | Pengzhan Guo | Zeyang Ye | Keli Xiao | Pengzhan Guo | Zeyang Ye | Wei Zhu
[1] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[2] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..
[3] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[4] Dan Alistarh,et al. The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory , 2018, PODC.
[5] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[6] Shanshan Li,et al. An iterative algorithm for optimal variable weighting in K-means clustering , 2019, Commun. Stat. Simul. Comput..
[7] Yuefan Deng,et al. Multi-User Mobile Sequential Recommendation: An Efficient Parallel Computing Paradigm , 2018, KDD.
[8] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[9] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[10] Ioannis Mitliagkas,et al. Parallel SGD: When does averaging help? , 2016, ArXiv.
[11] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[12] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[13] Yuefan Deng,et al. Parallel Simulated Annealing by Mixing of States , 1999 .
[14] Fuzhen Zhuang,et al. Shared Structure Learning for Multiple Tasks with Multiple Views , 2013, ECML/PKDD.
[15] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .
[16] Hamed Haddadi,et al. Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.
[17] Yuefan Deng,et al. Applying Simulated Annealing and Parallel Computing to the Mobile Sequential Recommendation , 2019, IEEE Transactions on Knowledge and Data Engineering.
[18] R. J. Paul,et al. Optimization Theory: The Finite Dimensional Case , 1977 .
[19] Yuefan Deng,et al. A Unified Theory of the Mobile Sequential Recommendation Problem , 2018, 2018 IEEE International Conference on Data Mining (ICDM).
[20] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[21] Jianping Yin,et al. Distributed and asynchronous Stochastic Gradient Descent with variance reduction , 2017, Neurocomputing.
[22] H. Robbins. A Stochastic Approximation Method , 1951 .
[23] Frank Wood,et al. Bayesian Distributed Stochastic Gradient Descent , 2018, NeurIPS.
[24] Mohamad Ivan Fanany,et al. Simulated Annealing Algorithm for Deep Learning , 2015 .
[25] Edilson de Aguiar,et al. Facial expression recognition with Convolutional Neural Networks: Coping with few data and the training sample order , 2017, Pattern Recognit..
[26] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[27] Chung-Hsing Yeh,et al. Task oriented weighting in multi-criteria analysis , 1999, Eur. J. Oper. Res..
[28] Dieter Wolf-Gladrow,et al. 5. Lattice Boltzmann Models , 2000 .
[29] Dacheng Tao,et al. Joint Deep Multi-View Learning for Image Clustering , 2020, IEEE Transactions on Knowledge and Data Engineering.
[30] P. Bullen. Handbook of means and their inequalities , 1987 .
[31] Yi Chen,et al. Differentiating search results on structured data , 2012, TODS.
[32] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[33] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[34] Raul Castro Fernandez,et al. Ako: Decentralised Deep Learning with Partial Gradient Exchange , 2016, SoCC.
[35] Dimitris S. Papailiopoulos,et al. Cyclades: Conflict-free Asynchronous Machine Learning , 2016, NIPS.
[36] Keli Xiao,et al. A Weighted Aggregating SGD for Scalable Parallelization in Deep Learning , 2019, 2019 IEEE International Conference on Data Mining (ICDM).
[37] D. Wolf-Gladrow. Lattice-Gas Cellular Automata and Lattice Boltzmann Models: An Introduction , 2000 .
[38] Lijun Zhang,et al. VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning , 2018, IEEE Transactions on Knowledge and Data Engineering.
[39] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[40] Cho-Jui Hsieh,et al. HogWild++: A New Mechanism for Decentralized Asynchronous Stochastic Gradient Descent , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).
[41] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[42] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[43] Keli Xiao,et al. A Parallel Simulated Annealing Enhancement of the Optimal-Matching Heuristic for Ridesharing , 2019, 2019 IEEE International Conference on Data Mining (ICDM).