Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
暂无分享,去创建一个
[1] Zeke Xie,et al. On the Overlooked Structure of Stochastic Gradients , 2022, 2212.02083.
[2] Edo Liberty. Even Simpler Deterministic Matrix Sketching , 2022, ArXiv.
[3] Lijun Zhang,et al. Efficient Adaptive Online Learning via Frequent Directions , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] William J. Dally,et al. Evolution of the Graphics Processing Unit (GPU) , 2021, IEEE Micro.
[5] Peter C. Ma,et al. Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[6] Vineeth N Balasubramanian,et al. A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization , 2020, AAAI.
[7] Michael L. Waskom,et al. Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..
[8] Taiji Suzuki,et al. When Does Preconditioning Help or Hurt Generalization? , 2020, ICLR.
[9] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[10] Jaime Fern'andez del R'io,et al. Array programming with NumPy , 2020, Nature.
[11] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[12] J. Leskovec,et al. Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.
[13] Naman Agarwal,et al. Disentangling Adaptive Gradient Methods from Learning Rates , 2020, ArXiv.
[14] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.
[15] Yi Zhang,et al. Extreme Tensoring for Low-Memory Preconditioning , 2019, ICLR.
[16] Ashok Cutkosky,et al. Better Full-Matrix Regret via Parameter-Free Online Learning , 2020, NeurIPS.
[17] Yi Zhang,et al. Efficient Full-Matrix Adaptive Regularization , 2020, ICML.
[18] Elad Hazan,et al. Lecture Notes: Optimization for Machine Learning , 2019, ArXiv.
[19] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[20] Yoram Singer,et al. Memory Efficient Adaptive Optimization , 2019, NeurIPS.
[21] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[22] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[23] Ethan Dyer,et al. Gradient Descent Happens in a Tiny Subspace , 2018, ArXiv.
[24] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.
[25] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[26] Yoram Singer,et al. Shampoo: Preconditioned Stochastic Tensor Optimization , 2018, ICML.
[27] Michael J. Henry,et al. Understanding and Exploiting the Low-Rank Structure of Deep Networks , 2018 .
[28] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[29] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[30] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[31] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[32] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[33] Joachim M. Buhmann,et al. Scalable Adaptive Stochastic Optimization Using Random Projections , 2016, NIPS.
[34] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[35] Haipeng Luo,et al. Efficient Second Order Online Learning by Sketching , 2016, NIPS.
[36] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] David P. Woodruff,et al. Frequent Directions: Simple and Deterministic Matrix Sketching , 2015, SIAM J. Comput..
[38] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[41] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[42] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[43] K. Audenaert. A generalisation of Mirsky's singular value inequalities , 2014, 1410.4941.
[44] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[45] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.
[46] Kaare Brandt Petersen,et al. The Matrix Cookbook , 2006 .
[47] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[48] Andrew V. Knyazev,et al. Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method , 2001, SIAM J. Sci. Comput..
[49] T. Andô. Concavity of certain maps on positive definite matrices and applications to Hadamard products , 1979 .
[50] M. Sain. Finite dimensional linear systems , 1972 .