Multi-Grained Knowledge Distillation for Named Entity Recognition
暂无分享,去创建一个
Chenyang Tao | Wei Wang | Junya Chen | Xuan Zhou | Xiao Zhang | Bing Xu | Jing Xiao | Chenyang Tao | Jing Xiao | Wei Wang | Junya Chen | Xuan Zhou | Bing Xu | Xiao Zhang
[1] Lawrence Carin,et al. Supercharging Imbalanced Data Learning With Causal Representation Transfer , 2020, ArXiv.
[2] Chenliang Li,et al. Exploiting Multiple Embeddings for Chinese Named Entity Recognition , 2019, CIKM.
[3] Yi Chang,et al. Iterative Viterbi A* Algorithm for K-Best Sequential Decoding , 2012, ACL.
[4] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[5] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[6] Teng Ren,et al. Learning Named Entity Tagger using Domain-Specific Dictionary , 2018, EMNLP.
[7] Ming Zhou,et al. A Tensorized Transformer for Language Modeling , 2019, NeurIPS.
[8] Eric Nichols,et al. Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.
[9] Yue Zhang,et al. NCRF++: An Open-source Neural Sequence Labeling Toolkit , 2018, ACL.
[10] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[11] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[12] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.
[13] Gina-Anne Levow,et al. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.
[14] Yue Zhang,et al. Chinese NER Using Lattice LSTM , 2018, ACL.
[15] J. Scott McCarley,et al. Pruning a BERT-based Question Answering Model , 2019, ArXiv.
[16] Jesper Nielsen. A Coarse-to-Fine Approach to Computing the k-Best Viterbi Paths , 2011, CPM.
[17] Dacheng Tao,et al. On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.
[19] Kewei Tu,et al. Structure-Level Knowledge Distillation For Multilingual Sequence Labeling , 2020, ACL.
[20] Ed H. Chi,et al. Understanding and Improving Knowledge Distillation , 2020, ArXiv.
[21] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[22] Ahmed Hassan Awadallah,et al. Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data , 2019, arXiv.org.
[23] Yu Sun,et al. ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.
[24] David Chiang,et al. Better k-best Parsing , 2005, IWPT.
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[27] Luo Si,et al. A Neural Multi-digraph Model for Chinese NER with Gazetteers , 2019, ACL.
[28] Minlong Peng,et al. Simplify the Usage of Lexicon in Chinese NER , 2019, ACL.
[29] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[31] Hwee Tou Ng,et al. Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.
[32] Chin-Yew Lin,et al. Towards Improving Neural Named Entity Recognition with Gazetteers , 2019, ACL.
[33] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.
[34] Roberto Cipolla,et al. Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Xuanjing Huang,et al. How to Fine-Tune BERT for Text Classification? , 2019, CCL.
[36] Andrew McCallum,et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.
[37] Koichi Shinoda,et al. Sequence-level Knowledge Distillation for Model Compression of Attention-based Sequence-to-sequence Speech Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[39] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[40] Nanyun Peng,et al. Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings , 2015, EMNLP.
[41] Wei Wu,et al. Glyce: Glyph-vectors for Chinese Character Representations , 2019, NeurIPS.
[42] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[43] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[44] Philippe Langlais,et al. Robust Lexical Features for Improved Neural Network Named-Entity Recognition , 2018, COLING.
[45] Xuanjing Huang,et al. CNN-Based Chinese NER with Lexicon Rethinking , 2019, IJCAI.
[46] R. Venkatesh Babu,et al. Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.
[47] Wanxiang Che,et al. Named Entity Recognition with Bilingual Constraints , 2013, HLT-NAACL.
[48] Naveen Arivazhagan,et al. Small and Practical BERT Models for Sequence Labeling , 2019, EMNLP.
[49] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.