暂无分享,去创建一个
Yoram Singer | Vineet Gupta | Rohan Anil | Tomer Koren | Kevin Regan | Y. Singer | Tomer Koren | Vineet Gupta | Rohan Anil | Kevin Regan
[1] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Adrian S. Lewis,et al. Nonsmooth optimization via quasi-Newton methods , 2012, Mathematical Programming.
[3] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[4] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[5] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[7] Peng Xu,et al. Sub-sampled Newton Methods with Non-uniform Sampling , 2016, NIPS.
[8] NICHOLAS J. HIGHAM,et al. A SCHUR–NEWTON METHOD FOR THE MATRIX PTH ROOT AND ITS INVERSE∗ , 2005 .
[9] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[10] Yi Zhang,et al. The Case for Full-Matrix Adaptive Regularization , 2018, ArXiv.
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[13] Shai Shalev-Shwartz,et al. Faster SGD Using Sketched Conditioning , 2015, ArXiv.
[14] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[15] R. Fletcher. Practical Methods of Optimization , 1988 .
[16] J. Nocedal. Updating Quasi-Newton Matrices With Limited Storage , 1980 .
[17] Naman Agarwal,et al. Second Order Stochastic Optimization in Linear Time , 2016, ArXiv.
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Chi-Kwong Li. Geometric Means , 2003 .
[20] Yoram Singer,et al. Shampoo: Preconditioned Stochastic Tensor Optimization , 2018, ICML.
[21] Andrea Montanari,et al. Convergence rates of sub-sampled Newton methods , 2015, NIPS.
[22] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[23] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[24] Nicholas J. Higham,et al. A Schur-Newton Method for the Matrix \lowercase{\boldmathp}th Root and its Inverse , 2006, SIAM J. Matrix Anal. Appl..
[25] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[26] Martin J. Wainwright,et al. Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence , 2015, SIAM J. Optim..