Quasi-Multitask Learning: an Efficient Surrogate for Obtaining Model Ensembles
暂无分享,去创建一个
[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[2] Anders Krogh,et al. Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.
[3] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[4] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[5] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.
[6] Jean-Michel Renders,et al. LSTM-Based Mixture-of-Experts for Knowledge-Aware Dialogues , 2016, Rep4NLP@ACL.
[7] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[8] Elliot Meyerson,et al. Pseudo-task Augmentation: From Deep Multitask Learning to Intratask Sharing - and Back , 2018, ICML.
[9] Thomas Wolf,et al. A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks , 2018, AAAI.
[10] Marek Rei,et al. Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.
[11] Michael Cogswell,et al. Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks , 2015, ArXiv.
[12] Oleksandr Makeyev,et al. Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).
[13] Kevin Duh,et al. DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.
[14] Eliyahu Kiperwasser,et al. Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.
[15] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[16] Joachim Bingel,et al. Identifying beneficial task relations for multi-task learning in deep neural networks , 2017, EACL.
[17] Roy Schwartz,et al. Show Your Work: Improved Reporting of Experimental Results , 2019, EMNLP.
[18] Eliyahu Kiperwasser,et al. Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.
[19] Steven Skiena,et al. Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.
[20] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[21] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[23] Barbara Plank,et al. Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging , 2018, EMNLP.
[24] Anders Søgaard,et al. Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.
[25] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[26] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[27] Barbara Plank,et al. Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.
[28] Quoc V. Le,et al. Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.
[29] Kilian Q. Weinberger,et al. Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.
[30] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[31] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.
[32] Barbara Plank,et al. Strong Baselines for Neural Semi-Supervised Learning under Domain Shift , 2018, ACL.