暂无分享,去创建一个
Qi Sun | Bin Dong | Zhenguo Li | Zewei Chen | Hexing Dong | Weizhen Dian | Jiacheng Sun | Yitong Sun
[1] Nikhil R. Devanur,et al. PipeDream: Fast and Efficient Pipeline Parallel DNN Training , 2018, ArXiv.
[2] E Weinan,et al. A Proposal on Machine Learning via Dynamical Systems , 2017, Communications in Mathematics and Statistics.
[3] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.
[4] Lars Ruthotto,et al. Layer-Parallel Training of Deep Residual Neural Networks , 2018, SIAM J. Math. Data Sci..
[5] Hojung Lee,et al. Local Critic Training of Deep Neural Networks , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).
[6] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[7] Martin Jaggi,et al. Decoupling Backpropagation using Constrained Optimization Methods , 2018 .
[8] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.
[9] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[10] Brian Kingsbury,et al. Beyond Backprop: Online Alternating Minimization with Auxiliary Variables , 2018, ICML.
[11] Daniel Liberzon,et al. Calculus of Variations and Optimal Control Theory: A Concise Introduction , 2012 .
[12] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[13] Yuan Yao,et al. A Convergence Analysis of Nonlinearly Constrained ADMM in Deep Learning , 2019, ArXiv.
[14] Eldad Haber,et al. Stable architectures for deep neural networks , 2017, ArXiv.
[15] Kurt Keutzer,et al. ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs , 2019, IJCAI.
[16] Long Chen,et al. Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..
[17] Martin J. Gander,et al. Nonlinear Convergence Analysis for the Parareal Algorithm , 2008 .
[18] F. Tröltzsch. Optimal Control of Partial Differential Equations: Theory, Methods and Applications , 2010 .
[19] Panos Parpas,et al. Predict Globally, Correct Locally: Parallel-in-Time Optimal Control of Neural Networks , 2019, ArXiv.
[20] Y. Maday,et al. A parareal in time procedure for the control of partial differential equations , 2002 .
[21] Thomas Paine,et al. GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training , 2013, ICLR.
[22] Ulrich Langer,et al. Domain decomposition methods in science and engineering XVII , 2008 .
[23] Evangelos A. Theodorou,et al. Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective , 2019, ArXiv.
[24] Lars Ruthotto,et al. Learning Across Scales - Multiscale Methods for Convolution Neural Networks , 2018, AAAI.
[25] Rolf Rannacher,et al. Multiple Shooting and Time Domain Decomposition Methods , 2015 .
[26] Jinshan Zeng,et al. On ADMM in Deep Learning: Convergence and Saturation-Avoidance , 2019 .
[27] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Takeru Miyato,et al. Synthetic Gradient Methods with Virtual Forward-Backward Networks , 2017, ICLR.
[29] M. Thorpe,et al. Deep limits of residual neural networks , 2018, Research in the Mathematical Sciences.
[30] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[31] Forrest N. Iandola,et al. FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[33] Sebastian Götschel,et al. An Efficient Parallel-in-Time Method for Optimization with Parabolic PDEs , 2019, SIAM J. Sci. Comput..
[34] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[35] Max Gunzburger,et al. Perspectives in flow control and optimization , 1987 .
[36] Y. Maday,et al. An adaptive parareal algorithm☆ , 2020, Journal of Computational and Applied Mathematics.
[37] Bin Dong,et al. Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.
[38] Jong-Seok Lee,et al. Local Critic Training for Model-Parallel Learning of Deep Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[39] Robert Hecht-Nielsen,et al. Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.
[40] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[41] Frederick Tung,et al. Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.