Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
暂无分享,去创建一个
Kewei Tu | Nguyen Bach | Zixia Jia | Tao Wang | Zhongqiang Huang | Fei Huang | Xinyu Wang | Yong Jiang | Zhaohui Yan | Nguyen Bach | Fei Huang | Kewei Tu | Xinyu Wang | Zixia Jia | Zhongqiang Huang | Tao Wang | Zhaohui Yan | Yong-jia Jiang
[1] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.
[2] Kewei Tu,et al. More Embeddings, Better Sequence Labelers? , 2020, EMNLP.
[3] Subhabrata Mukherjee,et al. XtremeDistil: Multi-stage Distillation for Massive Multilingual Models , 2020, ACL.
[4] Roland Vollgraf,et al. Contextual String Embeddings for Sequence Labeling , 2018, COLING.
[5] Cícero Nogueira dos Santos,et al. Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.
[6] Lifu Tu,et al. Benchmarking Approximate Inference Methods for Neural Structured Prediction , 2019, NAACL.
[7] Quoc V. Le,et al. BAM! Born-Again Multi-Task Networks for Natural Language Understanding , 2019, ACL.
[8] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[9] Timothy Dozat,et al. Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.
[10] Min Zhang,et al. Efficient Second-Order TreeCRF for Neural Dependency Parsing , 2020, ACL.
[11] Timothy Dozat,et al. Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.
[12] Rich Caruana,et al. Model compression , 2006, KDD '06.
[13] Yu Hong,et al. Don’t Eclipse Your Arts Due to Small Discrepancies: Boundary Repositioning with a Pointer Network for Aspect Extraction , 2020, ACL.
[14] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[15] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[16] Kewei Tu,et al. Enhanced Universal Dependency Parsing with Second-Order Inference and Mixture of Training Data , 2020, IWPT.
[17] Kewei Tu,et al. Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning , 2021, ACL/IJCNLP.
[18] Noah A. Smith,et al. Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser , 2016, EMNLP.
[19] Rotem Dror,et al. Deep Dominance - How to Properly Compare Deep Neural Models , 2019, ACL.
[20] Kai Yu,et al. Knowledge Distillation for Sequence Model , 2018, INTERSPEECH.
[21] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[22] Kewei Tu,et al. Second-Order Semantic Dependency Parsing with End-to-End Neural Networks , 2019, ACL.
[23] David Vilares,et al. Viable Dependency Parsing as Sequence Labeling , 2019, NAACL.
[24] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[25] Kewei Tu,et al. Automated Concatenation of Embeddings for Structured Prediction , 2020, ACL.
[26] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.
[27] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[28] Yue Zhang,et al. Design Challenges and Misconceptions in Neural Sequence Labeling , 2018, COLING.
[29] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.
[30] Heng Ji,et al. Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.
[31] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[32] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[33] Jingzhou Liu,et al. Stack-Pointer Networks for Dependency Parsing , 2018, ACL.
[34] Carlos G'omez-Rodr'iguez,et al. Distilling Neural Networks for Greener and Faster Dependency Parsing , 2020, IWPT.
[35] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[36] Juntao Yu,et al. Named Entity Recognition as Dependency Parsing , 2020, ACL.
[37] Di He,et al. Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.
[38] Kewei Tu,et al. Structure-Level Knowledge Distillation For Multilingual Sequence Labeling , 2020, ACL.
[39] Fandong Meng,et al. GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling , 2019, ACL.
[40] Ke Chen,et al. Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.
[42] Eva Schlinger,et al. How Multilingual is Multilingual BERT? , 2019, ACL.