Self-Attention Networks Can Process Bounded Hierarchical Languages
暂无分享,去创建一个
[1] Tie-Yan Liu,et al. Rethinking Positional Encoding in Language Pre-training , 2020, ICLR.
[2] Surya Ganguli,et al. RNNs Can Generate Bounded Hierarchical Languages with Optimal Memory , 2020, EMNLP.
[3] Wei Zhang,et al. How Can Self-Attention Networks Recognize Dyck-n Languages? , 2020, FINDINGS.
[4] Navin Goyal,et al. On the Ability of Self-Attention Networks to Recognize Counter Languages , 2020, EMNLP.
[5] Yu Zhang,et al. Fast and Accurate Neural CRF Constituency Parsing , 2020, IJCAI.
[6] Navin Goyal,et al. On the Computational Power of Transformers and Its Implications in Sequence Modeling , 2020, CONLL.
[7] Omer Levy,et al. Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.
[8] Dan Jurafsky,et al. Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models , 2020, EMNLP.
[9] Noah A. Smith,et al. A Formal Hierarchy of RNN Architectures , 2020, ACL.
[10] Jakob Grue Simonsen,et al. Encoding word order in complex embeddings , 2019, ICLR.
[11] Sashank J. Reddi,et al. Are Transformers universal approximators of sequence-to-sequence functions? , 2019, ICLR.
[12] Han He,et al. Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT , 2019, FLAIRS.
[13] Michael Hahn,et al. Theoretical Limitations of Self-Attention in Neural Sequence Models , 2019, TACL.
[14] Yonatan Belinkov,et al. Memory-Augmented Recurrent Neural Networks Can Learn Generalized Dyck Languages , 2019, ArXiv.
[15] Chris Quirk,et al. Novel positional encodings to enable tree-based transformers , 2019, NeurIPS.
[16] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[17] Ngoc Thang Vu,et al. Learning the Dyck Language with Attention-based Seq2Seq Models , 2019, BlackboxNLP@ACL.
[18] Samuel A. Korsky,et al. On the Computational Power of RNNs , 2019, ArXiv.
[19] Robert Frank,et al. Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.
[20] Zhaopeng Tu,et al. Assessing the Ability of Self-Attention Networks to Learn Word Order , 2019, ACL.
[21] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[22] Xing Wang,et al. Modeling Recurrence for Transformer , 2019, NAACL.
[23] John T Hale,et al. Hierarchical structure guides rapid linguistic predictions during naturalistic listening , 2019, PloS one.
[24] Pablo Barceló,et al. On the Turing Completeness of Modern Neural Network Architectures , 2019, ICLR.
[25] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[26] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[27] Robert C. Berwick,et al. Evaluating the Ability of LSTMs to Learn Context-Free Grammars , 2018, BlackboxNLP@EMNLP.
[28] Jean-Philippe Bernardy,et al. Can Recurrent Neural Networks Learn Nested Recursion? , 2018, LILT.
[29] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[30] Christof Monz,et al. The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.
[31] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[32] Lane Schwartz,et al. Unsupervised Grammar Induction with Depth-bounded PCFG , 2018, TACL.
[33] Morten H. Christiansen,et al. Hierarchical and sequential processing of language , 2018, Language, Cognition and Neuroscience.
[34] Tao Shen,et al. DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.
[35] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[36] Stanislas Dehaene,et al. Neurophysiological dynamics of phrase-structure building during sentence processing , 2017, Proceedings of the National Academy of Sciences.
[37] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[38] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Stephen C. Levinson,et al. Pragmatics as the origin of recursion , 2014 .
[41] Morten H. Christiansen,et al. How hierarchical is language use? , 2012, Proceedings of the Royal Society B: Biological Sciences.
[42] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[43] Alaa A. Kharbouch,et al. Three models for the description of language , 1956, IRE Trans. Inf. Theory.
[44] Noam Chomsky,et al. The faculty of language: what is it, who has it, and how did it evolve? , 2002, Science.
[45] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[46] Mark Steijvers,et al. A Recurrent Network that performs a Context-Sensitive Prediction Task , 1996 .
[47] Colin Giles,et al. Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .
[48] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[49] Noam Chomsky,et al. The Algebraic Theory of Context-Free Languages* , 1963 .