ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
暂无分享,去创建一个
Rami Al-Rfou | Colin Raffel | Noah Constant | Sharan Narang | Adam Roberts | Mihir Kale | Linting Xue | Aditya Barua
[1] A. Shashua,et al. Which transformer architecture fits my data? A vocabulary bottleneck in self-attention , 2021, ICML.
[2] Dan Garrette,et al. Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation , 2021, TACL.
[3] Orhan Firat,et al. Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution , 2021, NAACL.
[4] Diyi Yang,et al. The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics , 2021, GEM.
[5] Qun Liu,et al. Training Multilingual Pre-trained Language Model with Byte-level Subwords , 2021, ArXiv.
[6] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[7] Ankur Bapna,et al. Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus , 2020, COLING.
[8] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.
[9] Pierre Zweigenbaum,et al. CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters , 2020, COLING.
[10] Wei Chu,et al. Question Directed Graph Attention Network for Numerical Reasoning over Text , 2020, EMNLP.
[11] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[12] Omer Levy,et al. Neural Machine Translation without Embeddings , 2020, NAACL.
[13] Ngoc Thang Vu,et al. Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection , 2020, SIGMORPHON.
[14] Arya D. McCarthy,et al. The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion , 2020, SIGMORPHON.
[15] André F. T. Martins,et al. One-Size-Fits-All Multilingual Models , 2020, SIGMORPHON.
[16] Ryan Cotterell,et al. SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection , 2020, SIGMORPHON.
[17] Christo Kirov,et al. Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset , 2020, LREC.
[18] Orhan Firat,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.
[19] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.
[20] Philip S. Yu,et al. Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT , 2020, ArXiv.
[21] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[22] Peter J. Liu,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.
[23] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.
[24] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[25] Holger Schwenk,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.
[26] Kyunghyun Cho,et al. Neural Machine Translation with Byte-Level Subwords , 2019, AAAI.
[27] Noah Constant,et al. Bridging the Gap for Tokenizer-Free Language Models , 2019, ArXiv.
[28] Jason Baldridge,et al. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.
[29] William Yang Wang,et al. TWEETQA: A Social Media Focused Question Answering Dataset , 2019, ACL.
[30] Preslav Nakov,et al. One Size Does Not Fit All: Comparing NMT Representations of Different Granularities , 2019, NAACL.
[31] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[32] Zachary Chase Lipton,et al. Combating Adversarial Misspellings with Robust Word Recognition , 2019, ACL.
[33] Gabriel Stanovsky,et al. DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.
[34] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Alona Fyshe,et al. Interpreting Word-Level Hidden State Behaviour of Character-Level LSTM Language Models , 2018, BlackboxNLP@EMNLP.
[36] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.
[37] Ankur Bapna,et al. Revisiting Character-Based Neural Machine Translation with Capacity and Compression , 2018, EMNLP.
[38] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[39] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[40] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[41] Roland Vollgraf,et al. Contextual String Embeddings for Sequence Labeling , 2018, COLING.
[42] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.
[43] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[44] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[45] Marta R. Costa-jussà,et al. Byte-based Neural Machine Translation , 2017, SWCN@EMNLP.
[46] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[47] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[48] Alex Graves,et al. Neural Machine Translation in Linear Time , 2016, ArXiv.
[49] Jason Lee,et al. Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.
[50] Quoc V. Le,et al. HyperNetworks , 2016, ICLR.
[51] Quoc V. Le,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[52] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.
[53] Jürgen Schmidhuber,et al. Recurrent Highway Networks , 2016, ICML.
[54] Alex Graves,et al. Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.
[55] Yoshua Bengio,et al. A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.
[56] José A. R. Fonollosa,et al. Character-based Neural Machine Translation , 2016, ACL.
[57] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[58] Oriol Vinyals,et al. Multilingual Language Processing From Bytes , 2015, NAACL.
[59] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[60] Alexandra Birch,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[61] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.
[62] Alex Graves. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[63] Ilya Sutskever,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[64] Dorothee Reimann,et al. Fifth Conference of the European Chapter of the Association for Computational Linguistics , 1991 .
[65] Venice M. Adams. Louisiana , 1896, The Journal of Comparative Medicine and Veterinary Archives.
[66] Heng Ji,et al. Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.