LordBERT: Embedding Long Text by Segment Ordering with BERT
暂无分享,去创建一个
Haitao Zheng | Rui Zhang | Yimeng Dai | Borun Chen | Rongyi Sun | Rui Zhang
[1] Christian Szegedy,et al. Hierarchical Transformers Are More Efficient Language Models , 2021, NAACL-HLT.
[2] Jianlin Su,et al. RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.
[3] Jianzhong Qi,et al. Automatic Webpage Briefing , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).
[4] Hao Tian,et al. ERNIE-Doc: A Retrospective Long-Document Modeling Transformer , 2021, ACL.
[5] Tie-Yan Liu,et al. Rethinking Positional Encoding in Language Pre-training , 2020, ICLR.
[6] L. Verhoeven,et al. Profiling children's reading comprehension: A dynamic approach , 2020, Learning and Individual Differences.
[7] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[8] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[9] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[10] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[11] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[12] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[13] Chang Zhou,et al. CogLTX: Applying BERT to Long Texts , 2020, NeurIPS.
[14] Jes'us Villalba,et al. Hierarchical Transformers for Long Document Classification , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[15] Ruixuan Zhang,et al. BERT-AL: BERT for Arbitrarily Long Document Understanding , 2019 .
[16] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.
[17] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[18] Xueqi Cheng,et al. ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation , 2019, ACL.
[19] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[20] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[21] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[22] Yuan Luo,et al. Graph Convolutional Networks for Text Classification , 2018, AAAI.
[23] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[24] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[27] Christopher Clark,et al. Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.
[30] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.
[31] Xuanjing Huang,et al. Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.
[32] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.
[33] C. Ho,et al. A model of reading comprehension in Chinese elementary school children , 2013 .
[34] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[35] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.