暂无分享,去创建一个
[1] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[2] Bin Yang,et al. Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.
[3] S. Weisberg,et al. Residuals and Influence in Regression , 1982 .
[4] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[5] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.
[6] Iryna Gurevych,et al. AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.
[7] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[8] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[9] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[10] Jonathan Pilault,et al. Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data , 2020, ArXiv.
[11] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[12] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[13] Roee Aharoni,et al. Unsupervised Domain Clusters in Pretrained Language Models , 2020, ACL.
[14] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[15] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.
[16] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[17] Taro Watanabe,et al. Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection , 2018, WMT.
[18] Christof Monz,et al. Dynamic Data Selection for Neural Machine Translation , 2017, EMNLP.
[19] William D. Lewis,et al. Intelligent Selection of Language Model Training Data , 2010, ACL.
[20] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[21] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[22] Martti Vainio,et al. Proceedings of the Annual Conference of the International Speech Communication Association , 2016, Interspeech 2016.
[23] Yann LeCun,et al. Modeles connexionnistes de l'apprentissage , 1987 .
[24] Rico Sennrich,et al. Regularization techniques for fine-tuning in neural machine translation , 2017, EMNLP.
[25] Ankur Bapna,et al. Gradient-guided Loss Masking for Neural Machine Translation , 2021, ArXiv.
[26] David Grangier,et al. A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .
[27] Jianfeng Gao,et al. Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.
[28] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[29] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[30] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.
[31] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[32] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.
[33] Kosuke Imai,et al. Survey Sampling , 1998, Nov/Dec 2017.
[34] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[35] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[36] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[37] Tyler B. Johnson,et al. Training Deep Models Faster with Robust, Approximate Importance Sampling , 2018, NeurIPS.
[38] François Fleuret,et al. Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.
[39] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[40] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.
[41] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[43] Frederick Liu,et al. Estimating Training Data Influence by Tracking Gradient Descent , 2020, NeurIPS.
[44] Wojciech Stokowiec,et al. LanguageCrawl: a generic tool for building language models upon common Crawl , 2016, Language Resources and Evaluation.
[45] Roland Kuhn,et al. Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.
[46] Doug Downey,et al. Sampling Informative Training Data for RNN Language Models , 2018, ACL.
[47] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.