暂无分享,去创建一个
[1] Swarat Chaudhuri,et al. Control Regularization for Reduced Variance Reinforcement Learning , 2019, ICML.
[2] Eric P. Xing,et al. Self-Training for Jointly Learning to Ask and Answer Questions , 2018, NAACL.
[3] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.
[4] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[5] Quoc V. Le,et al. A Simple Method for Commonsense Reasoning , 2018, ArXiv.
[6] Olatunji Ruwase,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Nicu Sebe,et al. Curriculum Learning: A Survey , 2021 .
[8] Huda Khayrallah,et al. An Empirical Exploration of Curriculum Learning for Neural Machine Translation , 2018, ArXiv.
[9] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[10] Eric P. Xing,et al. Easy Questions First? A Case Study on Curriculum Learning for Question Answering , 2016, ACL.
[11] Barnabás Póczos,et al. Competence-based Curriculum Learning for Neural Machine Translation , 2019, NAACL.
[12] Xi Chen,et al. Variance Reduction for Stochastic Gradient Optimization , 2013, NIPS.
[13] Ammar Ahmad Awan,et al. 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB’s Convergence Speed , 2021, 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC).
[14] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[15] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[16] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[17] Zheng Cao,et al. Reducing BERT Computation by Padding Removal and Curriculum Learning , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[18] Xiangru Lian,et al. 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed , 2021, ICML.
[19] Noah A. Smith,et al. Shortformer: Better Language Modeling using Shorter Inputs , 2021, ACL.
[20] Yongdong Zhang,et al. Curriculum Learning for Natural Language Understanding , 2020, ACL.
[21] Hongzi Mao,et al. Variance Reduction for Reinforcement Learning in Input-Driven Environments , 2018, ICLR.
[22] Xin Wang,et al. A Comprehensive Survey on Curriculum Learning , 2020, ArXiv.
[23] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[24] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[27] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[28] Ondrej Bojar,et al. Results of the WMT17 Neural MT Training Task , 2017, WMT.
[29] Siu Cheung Hui,et al. Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives , 2019, ACL.
[30] Daniel Campos,et al. Curriculum learning for language modeling , 2021, ArXiv.
[31] Kevin Duh,et al. Curriculum Learning for Domain Adaptation in Neural Machine Translation , 2019, NAACL.
[32] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[33] Terence D. Sanger,et al. Neural network learning control of robot manipulators using gradually increasing task difficulty , 1994, IEEE Trans. Robotics Autom..
[34] Ondrej Bojar,et al. Curriculum Learning and Minibatch Bucketing in Neural Machine Translation , 2017, RANLP.
[35] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[36] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[37] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[38] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[39] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.