暂无分享,去创建一个
Badih Ghazi | Pasin Manurangsi | Vineet Gupta | Ravi Kumar | Rohan Anil | Vineet Gupta | Rohan Anil | Badih Ghazi | Ravi Kumar | Pasin Manurangsi
[1] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[2] George E. Dahl,et al. Faster Neural Network Training with Data Echoing , 2019, ArXiv.
[3] Joshua Ainslie,et al. FNet: Mixing Tokens with Fourier Transforms , 2021, NAACL.
[4] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[5] Manfred K. Warmuth,et al. LocoProp: Enhancing BackProp via Local Loss Optimization , 2021, ArXiv.
[6] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.
[7] Minhyung Cho,et al. Riemannian approach to batch normalization , 2017, NIPS.
[8] Elad Hoffer,et al. Norm matters: efficient and accurate normalization schemes in deep networks , 2018, NeurIPS.
[9] Noam Shazeer,et al. GSPMD: General and Scalable Parallelization for ML Computation Graphs , 2021, ArXiv.
[10] Yossi Matias,et al. Learning and Evaluating a Differentially Private Pre-trained Language Model , 2021, PRIVATENLP.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, ArXiv.
[13] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[14] Li Zhang,et al. Learning Differentially Private Language Models Without Losing Accuracy , 2017, ArXiv.
[15] Manfred K. Warmuth,et al. Robust Bi-Tempered Logistic Loss Based on Bregman Divergences , 2019, NeurIPS.
[16] George E. Dahl,et al. A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes , 2021, ArXiv.
[17] Seong Joon Oh,et al. AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights , 2021, ICLR.
[18] Moni Naor,et al. Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.
[19] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[21] H. Brendan McMahan,et al. Differentially Private Learning with Adaptive Clipping , 2019, NeurIPS.
[22] Úlfar Erlingsson,et al. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.
[23] Yoram Singer,et al. Shampoo: Preconditioned Stochastic Tensor Optimization , 2018, ICML.
[24] Ian Goodfellow,et al. Deep Learning with Differential Privacy , 2016, CCS.
[25] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[26] Manfred K. Warmuth,et al. Step-size Adaptation Using Exponentiated Gradient Updates , 2022, ArXiv.
[27] Geoffrey E. Hinton,et al. Large scale distributed neural network training through online distillation , 2018, ICLR.
[28] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[29] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[30] Gautam Kamath,et al. Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization , 2020, NeurIPS.
[31] Ilya Mironov,et al. Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).
[32] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[33] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..
[34] Vitaly Feldman,et al. Does learning require memorization? a short tale about a long tail , 2019, STOC.
[35] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[36] Yoram Singer,et al. Memory Efficient Adaptive Optimization , 2019, NeurIPS.
[37] Dietrich Klakow,et al. Robust Differentially Private Training of Deep Neural Networks , 2020, ArXiv.
[38] Antti Honkela,et al. Learning Rate Adaptation for Differentially Private Learning , 2020, AISTATS.
[39] Naman Agarwal,et al. Stochastic Optimization with Laggard Data Pipelines , 2020, NeurIPS.
[40] Vitaly Feldman,et al. When is memorization of irrelevant training data necessary for high-accuracy learning? , 2020, STOC.
[41] Sashank J. Reddi,et al. AdaCliP: Adaptive Clipping for Private SGD , 2019, ArXiv.
[42] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[43] Matthew Johnson,et al. Compiling machine learning programs via high-level tracing , 2018 .
[44] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[45] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.
[46] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.