论文信息 - The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding - 字舞流文

The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding

We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks, using a variety of objectives (classification, regression, structured prediction) and text encoders (e.g., RNNs, BERT, RoBERTa, UniLM). A unique feature of MT-DNN is its built-in support for robust and transferable learning using the adversarial multi-task learning paradigm. To enable efficient production deployment, MT-DNN supports multi-task knowledge distillation, which can substantially compress a deep neural model without significant performance drop. We demonstrate the effectiveness of MT-DNN on a wide range of NLU applications across general and biomedical domains. The software and pre-trained models will be publicly available at https://github.com/namisan/mt-dnn.

Jianfeng Gao | Hoifung Poon | Hao Cheng | Xiaodong Liu | Weizhu Chen | Guihong Cao | Yu Wang | Pengcheng He | Jianshu Ji | Xueyun Zhu | E. Awa

[1] Jianfeng Gao,et al. Adversarial Training for Large Neural Language Models , 2020, ArXiv.

[2] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[3] Jianfeng Gao,et al. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2019, ACL.

[4] J. Weston,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.

[5] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[6] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[7] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[8] T. Goldstein,et al. FreeLB: Enhanced Adversarial Training for Natural Language Understanding , 2019, ICLR.

[9] T. Goldstein,et al. FreeLB: Enhanced Adversarial Training for Language Understanding , 2019, ICLR 2020.

[10] Hao Tian,et al. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[11] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[12] Quoc V. Le,et al. BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.

[13] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[14] Xiaodong Liu,et al. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.

[15] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[16] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.

[17] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[18] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[19] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[20] Peter Clark,et al. SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[21] Xiaodong Liu,et al. Stochastic Answer Networks for Natural Language Inference , 2018, ArXiv.

[22] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[23] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[24] Andreas Vlachos,et al. FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[25] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[26] Xiaodong Liu,et al. Stochastic Answer Networks for Machine Reading Comprehension , 2017, ACL.

[27] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[28] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[29] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[31] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[32] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[34] Quoc V. Le,et al. Multi-task Sequence to Sequence Learning , 2015, ICLR.

[35] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[36] Masashi Sugiyama,et al. Bayesian dark knowledge , 2015, NIPS.

[37] Xiaodong Liu,et al. Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[38] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[39] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[40] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[41] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[42] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[43] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[44] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[45] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[46] Rich Caruana,et al. Multitask Learning , 1997, Machine Learning.

[47] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[48] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[49] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .

[50] Peter Clark,et al. The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[51] Ido Dagan,et al. The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[52] Roy Bar-Haim,et al. The Second PASCAL Recognising Textual Entailment Challenge , 2006 .

[53] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.