暂无分享,去创建一个
Elizabeth Salesky | Colin Raffel | Benoit Sagot | Zaid Alyafeai | Matthias Gall'e | Chenglei Si | Manan Dey | Sabrina J. Mielke | Arun Raja | Wilson Y. Lee | Samson Tan | Colin Raffel | Benoît Sagot | Chenglei Si | Elizabeth Salesky | Matthias Gallé | Zaid Alyafeai | Arun Raja | Manan Dey | Samson Tan | Wilson Y. Lee
[1] Yating Yang,et al. Morphological Word Segmentation on Agglutinative Languages for Neural Machine Translation , 2020, ArXiv.
[2] Eva Schlinger,et al. How Multilingual is Multilingual BERT? , 2019, ACL.
[3] John A. Goldsmith,et al. Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.
[4] Ryan Cotterell,et al. Are All Languages Equally Hard to Language-Model? , 2018, NAACL.
[5] Jinsong Su,et al. Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation , 2021, ACL.
[6] Chris Dyer,et al. Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling , 2017, ACL.
[7] Rico Sennrich,et al. How Suitable Are Subword Segmentation Strategies for Translating Non-Concatenative Morphology? , 2021, EMNLP.
[8] Francisco Casacuberta,et al. How Much Does Tokenization Affect Neural Machine Translation? , 2018, CICLing.
[9] James Henderson. The Unstoppable Rise of Computational Linguistics in Deep Learning , 2020, ACL.
[10] David J. C. MacKay,et al. A hierarchical Dirichlet language model , 1995, Natural Language Engineering.
[11] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.
[12] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[13] Christo Kirov,et al. Neural Polysynthetic Language Modelling , 2020, ArXiv.
[14] T. Griffiths,et al. A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.
[15] Wilker Aziz,et al. A Latent Morphology Model for Open-Vocabulary Neural Machine Translation , 2020, ICLR.
[16] Iryna Gurevych,et al. How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models , 2021, ACL/IJCNLP.
[17] Thomas L. Griffiths,et al. Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.
[18] Edouard Grave,et al. Training Hybrid Language Models by Marginalizing over Segmentations , 2019, ACL.
[19] Dave Dopson,et al. Fast WordPiece Tokenization , 2020, EMNLP.
[20] Mikko Kurimo,et al. Empirical Comparison of Evaluation Methods for Unsupervised Learning of Morphology , 2011, TAL.
[21] Ryan Cotterell,et al. What Kind of Language Is Hard to Language-Model? , 2019, ACL.
[22] Thomas L. Griffiths,et al. Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.
[23] Ting Liu,et al. CharBERT: Character-aware Pre-trained Language Model , 2020, COLING.
[24] Brian Roark,et al. Probabilistic ParaMor , 2009, CLEF.
[25] Zaixiang Zheng,et al. Vocabulary Learning via Optimal Transport for Neural Machine Translation , 2021, ACL/IJCNLP.
[26] Matthew G. Snover,et al. A Bayesian Model for Morpheme and Paradigm Identification , 2001, ACL.
[27] Marcello Federico,et al. A Statistical Extension of Byte-Pair Encoding , 2021, IWSLT.
[28] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[29] Zhen Qin,et al. Charformer: Fast Character Transformers via Gradient-based Subword Tokenization , 2021, ArXiv.
[30] Thomas L. Griffiths,et al. Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models , 2011, J. Mach. Learn. Res..
[31] Matthew G. Snover,et al. A Probabilistic Model for Learning Concatenative Morphology , 2002, NIPS.
[32] John Wieting,et al. CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation , 2021, ArXiv.
[33] Yoshua Bengio,et al. Multiscale sequence modeling with a learned dictionary , 2017, ArXiv.
[34] Eric P. Xing,et al. Word Shape Matters: Robust Machine Translation with Visual Embedding , 2020, ArXiv.
[35] Jonne Saleva,et al. The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation , 2021, EACL.
[36] Dan Roth,et al. Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.
[37] Andrew M. Dai,et al. Language-independent compound splitting with morphological operations , 2011, ACL.
[38] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[39] Jason Eisner,et al. Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model , 2018, AAAI.
[40] Helmut Schmid,et al. Why don't people use character-level machine translation? , 2021, ArXiv.
[41] John DeNero,et al. Painless Unsupervised Learning with Features , 2010, NAACL.
[42] Matthias Gallé,et al. Investigating the Effectiveness of BPE: The Power of Shorter Sequences , 2019, EMNLP.
[43] Xiaoqing Zheng,et al. Unsupervised Word Segmentation with Bi-directional Neural Language Model , 2021, ArXiv.
[44] Valentin Hofmann,et al. Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words , 2021, ACL.
[45] Lin Yang,et al. Squared English Word: A Method of Generating Glyph to Use Super Characters for Sentiment Analysis , 2019, AffCon@AAAI.
[46] Marc-Alexandre Côté,et al. Revisiting the Hierarchical Multiscale LSTM , 2018, COLING.
[47] Nikola I. Nikolov,et al. Character-Level Translation with Self-attention , 2020, ACL.
[48] Noah A. Smith,et al. Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank , 2020, EMNLP 2020.
[49] Omer Levy,et al. Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens , 2021, ArXiv.
[50] Mikko Kurimo,et al. Morfessor 2.0: Toolkit for statistical morphological segmentation , 2014, EACL.
[51] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[52] Philip Gage,et al. A new algorithm for data compression , 1994 .
[53] Wonyong Sung,et al. Character-level language modeling with hierarchical recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[55] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[56] Graham Neubig,et al. Multi-view Subword Regularization , 2021, NAACL.
[57] Micha Elsner,et al. A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability , 2013, EMNLP.
[58] Pierre Zweigenbaum,et al. CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters , 2020, COLING.
[59] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.
[60] J. Urgen Schmidhuber,et al. Neural sequence chunkers , 1991, Forschungsberichte, TU Munich.
[61] Katharina Kann,et al. How to Adapt Your Pretrained Multilingual Model to 1600 Languages , 2021, ACL.
[62] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.
[63] Marian Alexandru Baroni. Distributional cues in morpheme discovery: A computational model and empirical evidence , 2000 .
[64] Hang Li,et al. AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization , 2020, ArXiv.
[65] David Yarowsky,et al. Minimally Supervised Morphological Analysis by Multimodal Alignment , 2000, ACL.
[66] Hyung Won Chung,et al. Improving Multilingual Models with Language-Clustered Vocabularies , 2020, EMNLP.
[67] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.
[68] Yu Zhang,et al. Latent Sequence Decompositions , 2016, ICLR.
[69] Regina Barzilay,et al. An Unsupervised Method for Uncovering Morphological Chains , 2015, TACL.
[70] J. Wolff. AN ALGORITHM FOR THE SEGMENTATION OF AN ARTIFICIAL LANGUAGE ANALOGUE , 1975 .
[71] Chunyu Kit,et al. Tokenization as the Initial Phase in NLP , 1992, COLING.
[72] Michael Zhu,et al. Recurrent Neural Networks with Mixed Hierarchical Structures for Natural Language Processing , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).
[73] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.
[74] Yuval Pinter,et al. Integrating Approaches to Word Representation , 2021, ArXiv.
[75] Jason Lee,et al. Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.
[76] Jonathan May,et al. Finding the Optimal Vocabulary Size for Neural Machine Translation , 2020, FINDINGS.
[77] Falcon Z. Dai,et al. Glyph-aware Embedding of Chinese Characters , 2017, SWCN@EMNLP.
[78] Frank D. Wood,et al. The sequence memoizer , 2011, Commun. ACM.
[79] Eugene Kharitonov,et al. How BPE Affects Memorization in Transformers , 2021, ArXiv.
[80] Barbara Plank,et al. Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.
[81] Philipp Koehn,et al. Empirical Methods for Compound Splitting , 2003, EACL.
[82] Ankur Bapna,et al. Revisiting Character-Based Neural Machine Translation with Capacity and Compression , 2018, EMNLP.
[83] Daniel Jurafsky,et al. Knowledge-Free Induction of Inflectional Morphologies , 2001, NAACL.
[84] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.
[85] Chris Dyer,et al. Learning to Discover, Ground and Use Words with Segmental Neural Language Models , 2018, ACL.
[86] Elizabeth Salesky,et al. Robust Open-Vocabulary Translation from Visual Text Representations , 2021, EMNLP.
[87] Kevin Duh,et al. BPE and CharCNNs for Translation of Morphology: A Cross-Lingual Comparison and Analysis , 2018, ArXiv.
[88] Mathias Creutz,et al. INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT , 2005 .
[89] Zhi-Hong Deng,et al. Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling , 2018, EMNLP.
[90] Orhan Firat,et al. Towards End-to-End In-Image Neural Machine Translation , 2020, NLPBT.
[91] Naoaki Okazaki,et al. Joint Optimization of Tokenization and Downstream Model , 2021, FINDINGS.
[92] Djam'e Seddah,et al. Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models , 2021, WNUT.
[93] Mikko Kurimo,et al. Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology , 2014, COLING.
[94] Bonaventure F. P. Dossou,et al. Crowdsourced Phrase-Based Tokenization for Low-Resourced Neural Machine Translation: The Case of Fon Language , 2021, ArXiv.
[95] Noah Constant,et al. Bridging the Gap for Tokenizer-Free Language Models , 2019, ArXiv.
[96] Ondrej Bojar,et al. Morphological and Language-Agnostic Word Segmentation for NMT , 2018, TSD.
[97] Sharon Goldwater,et al. From Segmentation to Analyses: a Probabilistic Model for Unsupervised Morphology Induction , 2017, EACL.
[98] Hiroyuki Shindo,et al. Stochastic Tokenization with a Language Model for Neural Text Classification , 2019, ACL.
[99] Christopher D. Manning,et al. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.
[100] Mathias Creutz,et al. Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.
[101] Jacob Eisenstein,et al. Mimicking Word Embeddings using Subword RNNs , 2017, EMNLP.
[102] Lin Yang,et al. Super Characters: A Conversion from Sentiment Classification to Image Classification , 2018, WASSA@EMNLP.
[103] Irfan Ahmad,et al. Evaluating Various Tokenizers for Arabic Text Classification , 2021, Neural Processing Letters.
[104] Hinrich Schutze,et al. Wine is Not v i n. - On the Compatibility of Tokenizations Across Languages , 2021, EMNLP.
[105] Nizar Habash,et al. CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing , 2018, LREC.
[106] A. Moffat,et al. Offline dictionary-based compression , 2000, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).
[107] Orhan Firat,et al. On the Importance of Word Boundaries in Character-level Neural Machine Translation , 2019, EMNLP.
[108] Alon Lavie,et al. ParaMor: Minimally Supervised Induction of Paradigm Structure and Morphological Analysis , 2007, SIGMORPHON.
[109] Thamar Solorio,et al. Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality , 2020, EMNLP.
[110] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.
[111] Naonori Ueda,et al. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.
[112] Marcello Federico,et al. Compositional Representation of Morphologically-Rich Input for Neural Machine Translation , 2018, ACL.
[113] Ole Winther,et al. Hash Embeddings for Efficient Word Representations , 2017, NIPS.
[114] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.
[115] G. Huet. Lexicon-directed segmentation and tagging in Sanskrit , 2003 .
[116] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[117] Elena Voita,et al. BPE-Dropout: Simple and Effective Subword Regularization , 2020, ACL.
[118] Cícero Nogueira dos Santos,et al. Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.
[119] Frederick Liu,et al. Learning Character-level Compositionality with Visual Features , 2017, ACL.
[120] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.
[121] Zhiyuan Liu,et al. SHUOWEN-JIEZI: Linguistically Informed Tokenizers For Chinese Language Model Pretraining , 2021, ArXiv.
[122] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[123] Pushpak Bhattacharyya,et al. Meaningless yet meaningful: Morphology grounded subword-level NMT , 2018 .
[124] James A. Storer,et al. Data compression via textual substitution , 1982, JACM.
[125] Mathias Creutz,et al. Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.
[126] M. A. Jiménez-Montaño,et al. On the syntactic structure of protein sequences and the concept of grammar complexity , 1984 .
[127] Grzegorz Chrupala. Text segmentation with character-level text embeddings , 2013, ICML 2013.
[128] Graham Neubig,et al. Using Morphological Knowledge in Open-Vocabulary Neural Language Models , 2018, NAACL.
[129] Vít Novotný,et al. One Size Does Not Fit All: Finding the Optimal N-gram Sizes for FastText Models across Languages , 2021, ArXiv.
[130] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.
[131] Constantine Lignos. Learning from Unseen Data , 2010 .
[132] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[133] Joakim Nivre,et al. Universal Word Segmentation: Implementation and Interpretation , 2018, TACL.
[134] Éric Villemonte de la Clergerie,et al. MAF: a Morphosyntactic Annotation Framework , 2005 .
[135] Artem Sokolov,et al. Learning to Segment Inputs for NMT Favors Character-Level Processing , 2018, IWSLT.
[136] J. Rissanen. Stochastic Complexity in Statistical Inquiry Theory , 1989 .
[137] Marta R. Costa-jussà,et al. Neural machine translation using bitmap fonts , 2016 .
[138] Mikko Kurimo,et al. Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning , 2020, LREC.
[139] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[140] Wei Wu,et al. Glyce: Glyph-vectors for Chinese Character Representations , 2019, NeurIPS.
[141] Gina-Anne Levow,et al. A Masked Segmental Language Model for Unsupervised Natural Language Segmentation , 2021, SIGMORPHON.
[142] Kevin Duh,et al. A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation , 2019, MTSummit.
[143] Marta R. Costa-jussà,et al. Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts , 2017, Machine Translation.
[144] Carl de Marcken. Linguistic Structure as Composition and Perturbation , 1996, ACL.
[145] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[146] Christian Bentz,et al. From characters to words: the turning point of BPE merges , 2021, EACL.
[147] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[148] Yoshua Bengio,et al. A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.
[149] Seongbo Jang,et al. An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks , 2020, AACL.
[150] Yann LeCun,et al. Very Deep Convolutional Networks for Text Classification , 2016, EACL.
[151] Shafiq R. Joty,et al. Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding , 2020, EMNLP.
[152] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[153] Elizabeth Salesky,et al. Optimizing segmentation granularity for neural machine translation , 2018, Machine Translation.
[154] Mark Johnson,et al. Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.
[155] Marcello Federico,et al. An Evaluation of Two Vocabulary Reduction Methods for Neural Machine Translation , 2018, AMTA.
[156] Noah A. Smith,et al. Segmental Recurrent Neural Networks , 2015, ICLR.
[157] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.
[158] Ilya Sutskever,et al. SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .
[159] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[160] Chong Wang,et al. Towards Neural Phrase-based Machine Translation , 2017, ICLR.
[161] Alexander M. Fraser,et al. Target-side Word Segmentation Strategies for Neural Machine Translation , 2017, WMT.
[162] Chong Wang,et al. Sequence Modeling via Segmentations , 2017, ICML.
[163] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[164] Kei Uchiumi,et al. Optimizing Word Segmentation for Downstream Task , 2020, FINDINGS.
[165] Colin Raffel,et al. ByT5: Towards a token-free future with pre-trained byte-to-byte models , 2021, ArXiv.
[166] Mikko Kurimo,et al. Morpho Challenge 2005-2010: Evaluations and Results , 2010, SIGMORPHON.
[167] Thomas L. Griffiths,et al. Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.
[168] Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems , 2020, EMNLP.
[169] Pawan Goyal,et al. A Dataset for Sanskrit Word Segmentation , 2017, LaTeCH@ACL.
[170] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[171] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[172] Gholamreza Haffari,et al. Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation , 2020, ACL.
[173] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[174] Kenneth Ward Church. Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2 , 2000, COLING.
[175] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[176] Benoît Sagot,et al. SxPipe 2: architecture pour le traitement pré-syntaxique de corpus bruts , 2008 .
[177] Carlos Gómez-Rodríguez,et al. Comparing neural‐ and N‐gram‐based language models for word segmentation , 2018, J. Assoc. Inf. Sci. Technol..
[178] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[179] Dan Roth,et al. Extending Multilingual BERT to Low-Resource Languages , 2020, FINDINGS.