暂无分享,去创建一个
Giuseppe Carenini | Wen Xiao | Raymond Li | Lanjun Wang | G. Carenini | Lanjun Wang | Raymond Li | Wen Xiao
[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[2] Yu Cheng,et al. Discourse-Aware Neural Extractive Text Summarization , 2020, ACL.
[3] Todor Mihaylov,et al. Discourse-Aware Semantic Self-Attention for Narrative Reading Comprehension , 2019, EMNLP.
[4] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[5] J. Tiedemann,et al. Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation , 2020, FINDINGS.
[6] Jesse Vig,et al. A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.
[7] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Yiyun Zhao,et al. How does BERT’s attention change when you fine-tune? An analysis methodology and a case study in negation scope , 2020, ACL.
[10] Ming Zhou,et al. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.
[11] Byung Cheol Song,et al. Graph-based Knowledge Distillation by Multi-head Attention Network , 2019, BMVC.
[12] Giuseppe Carenini,et al. Predicting Discourse Trees from Transformer-based Neural Summarizers , 2021, NAACL.
[13] Pengfei Liu,et al. Extractive Summarization as Text Matching , 2020, ACL.
[14] Furu Wei,et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.
[15] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[16] Jimmy J. Lin,et al. DocBERT: BERT for Document Classification , 2019, ArXiv.
[17] Pavlo Molchanov,et al. Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[19] Tong Zhang,et al. Modeling Localness for Self-Attention Networks , 2018, EMNLP.
[20] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[21] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.
[22] Bowen Zhou,et al. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.
[23] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[24] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[25] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[26] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.
[27] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[28] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .
[29] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[30] Jiacheng Xu,et al. Neural Extractive Text Summarization with Syntactic Compression , 2019, EMNLP.
[31] Jianping Gou,et al. Knowledge Distillation: A Survey , 2020, International Journal of Computer Vision.
[32] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.
[33] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.
[34] Zhe Zhao,et al. K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.
[35] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[36] Giuseppe Carenini,et al. T3-Vis: visual analytic for Training and fine-Tuning Transformers in NLP , 2021, EMNLP.
[37] Yonatan Belinkov,et al. Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.
[38] Sebastian Gehrmann,et al. exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models , 2019, ArXiv.
[39] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.
[40] André F. T. Martins,et al. Do Context-Aware Translation Models Pay the Right Attention? , 2021, ACL.
[41] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[42] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[43] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[44] 悠太 菊池,et al. 大規模要約資源としてのNew York Times Annotated Corpus , 2015 .
[45] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[46] Yunhai Tong,et al. Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees , 2021, EACL.
[47] Xuanjing Huang,et al. Mask Attention Networks: Rethinking and Strengthen Transformer , 2021, NAACL.
[48] Minyi Guo,et al. How Far Does BERT Look At: Distance-based Clustering and Analysis of BERT's Attention , 2020, COLING.