Self-Attention Networks Can Process Bounded Hierarchical Languages
暂无分享,去创建一个
[1] Noam Chomsky,et al. The faculty of language: what is it, who has it, and how did it evolve? , 2002, Science.
[2] Navin Goyal,et al. On the Ability of Self-Attention Networks to Recognize Counter Languages , 2020, EMNLP.
[3] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[4] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[5] Noam Chomsky,et al. The Algebraic Theory of Context-Free Languages* , 1963 .
[6] Dan Jurafsky,et al. Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models , 2020, EMNLP.
[7] Pablo Barceló,et al. On the Turing Completeness of Modern Neural Network Architectures , 2019, ICLR.
[8] Lane Schwartz,et al. Unsupervised Grammar Induction with Depth-bounded PCFG , 2018, TACL.
[9] Tao Shen,et al. DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.
[10] Surya Ganguli,et al. RNNs Can Generate Bounded Hierarchical Languages with Optimal Memory , 2020, EMNLP.
[11] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[12] Xing Wang,et al. Modeling Recurrence for Transformer , 2019, NAACL.
[13] Chris Quirk,et al. Novel positional encodings to enable tree-based transformers , 2019, NeurIPS.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Navin Goyal,et al. On the Computational Power of Transformers and Its Implications in Sequence Modeling , 2020, CONLL.
[16] Yu Zhang,et al. Fast and Accurate Neural CRF Constituency Parsing , 2020, IJCAI.
[17] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[18] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[19] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[20] Wei Zhang,et al. How Can Self-Attention Networks Recognize Dyck-n Languages? , 2020, FINDINGS.
[21] Jean-Philippe Bernardy,et al. Can Recurrent Neural Networks Learn Nested Recursion? , 2018, LILT.
[22] Robert Frank,et al. Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.
[23] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[24] Yonatan Belinkov,et al. Memory-Augmented Recurrent Neural Networks Can Learn Generalized Dyck Languages , 2019, ArXiv.
[25] Christof Monz,et al. The Importance of Being Recurrent for Modeling Hierarchical Structure , 2018, EMNLP.
[26] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[27] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[28] Morten H. Christiansen,et al. How hierarchical is language use? , 2012, Proceedings of the Royal Society B: Biological Sciences.
[29] Robert C. Berwick,et al. Evaluating the Ability of LSTMs to Learn Context-Free Grammars , 2018, BlackboxNLP@EMNLP.
[30] Noah A. Smith,et al. A Formal Hierarchy of RNN Architectures , 2020, ACL.
[31] Ankit Singh Rawat,et al. Are Transformers universal approximators of sequence-to-sequence functions? , 2020, ICLR.
[32] Omer Levy,et al. Emergent linguistic structure in artificial neural networks trained by self-supervision , 2020, Proceedings of the National Academy of Sciences.
[33] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[34] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[35] Alaa A. Kharbouch,et al. Three models for the description of language , 1956, IRE Trans. Inf. Theory.
[36] Mark Steijvers,et al. A Recurrent Network that performs a Context-Sensitive Prediction Task , 1996 .
[37] Michael Hahn,et al. Theoretical Limitations of Self-Attention in Neural Sequence Models , 2019, TACL.
[38] Stanislas Dehaene,et al. Neurophysiological dynamics of phrase-structure building during sentence processing , 2017, Proceedings of the National Academy of Sciences.
[39] Stephen C. Levinson,et al. Pragmatics as the origin of recursion , 2014 .
[40] Morten H. Christiansen,et al. Hierarchical and sequential processing of language , 2018, Language, Cognition and Neuroscience.
[41] Han He,et al. Establishing Strong Baselines for the New Decade: Sequence Tagging, Syntactic and Semantic Parsing with BERT , 2019, FLAIRS.
[42] Ngoc Thang Vu,et al. Learning the Dyck Language with Attention-based Seq2Seq Models , 2019, BlackboxNLP@ACL.
[43] Zhaopeng Tu,et al. Assessing the Ability of Self-Attention Networks to Learn Word Order , 2019, ACL.
[44] Jakob Grue Simonsen,et al. Encoding word order in complex embeddings , 2019, ICLR.
[45] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[46] Tie-Yan Liu,et al. Rethinking Positional Encoding in Language Pre-training , 2020, ICLR.
[47] Samuel A. Korsky,et al. On the Computational Power of RNNs , 2019, ArXiv.
[48] John T Hale,et al. Hierarchical structure guides rapid linguistic predictions during naturalistic listening , 2019, PloS one.
[49] Colin Giles,et al. Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .