What they do when in doubt: a study of inductive biases in seq2seq learners
暂无分享,去创建一个
[1] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[2] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..
[3] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[4] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[5] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[6] Ngoc Thang Vu,et al. Learning the Dyck Language with Attention-based Seq2Seq Models , 2019, BlackboxNLP@ACL.
[7] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[8] Marta R. Costa-jussà,et al. Joint Source-Target Self Attention with Locality Constraints , 2019, ArXiv.
[9] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[10] Jason Weston,et al. Jump to better conclusions: SCAN both left and right , 2018, BlackboxNLP@EMNLP.
[11] Mathijs Mul,et al. Compositionality Decomposed: How do Neural Networks Generalise? , 2019, J. Artif. Intell. Res..
[12] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[15] Eran Yahav,et al. On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.
[16] J. Tenenbaum,et al. The learnability of abstract syntactic principles , 2011, Cognition.
[17] Marco Baroni,et al. CNNs found to jump around more skillfully than RNNs: Compositional Generalization in Seq2seq Convolutional Networks , 2019, ACL.
[18] Peter Grünwald,et al. A tutorial introduction to the minimum description length principle , 2004, ArXiv.
[19] Eugene Kharitonov,et al. Word-order Biases in Deep-agent Emergent Communication , 2019, ACL.
[20] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[21] Yann Ollivier,et al. The Description Length of Deep Learning models , 2018, NeurIPS.
[22] Marco Baroni,et al. Human few-shot learning of compositional instructions , 2019, CogSci.
[23] Noah A. Smith,et al. A Formal Hierarchy of RNN Architectures , 2020, ACL.
[24] J. Elman,et al. Rethinking Innateness: A Connectionist Perspective on Development , 1996 .
[25] M. Raijmakers. Rethinking innateness: A connectionist perspective on development. , 1997 .
[26] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.
[27] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[28] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[29] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[31] Samy Bengio,et al. Identity Crisis: Memorization and Generalization under Extreme Overparameterization , 2019, ICLR.
[32] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[33] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..
[34] Yonatan Belinkov,et al. LSTM Networks Can Perform Dynamic Counting , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.
[35] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.
[36] Di He,et al. Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation , 2018, NeurIPS.
[37] Brenden M. Lake,et al. Learning Inductive Biases with Simple Neural Networks , 2018, CogSci.
[38] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.
[39] Marco Baroni,et al. Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks , 2018, BlackboxNLP@EMNLP.
[40] David Lopez-Paz,et al. Permutation Equivariant Models for Compositional Generalization in Language , 2020, ICLR.
[41] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[42] Emmanuel Dupoux,et al. Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , 2016, Cognition.
[43] R. Thomas McCoy,et al. Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks , 2020, TACL.
[44] Samuel Ritter,et al. Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.
[45] Noam Chomsky,et al. वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .
[46] Hava T. Siegelmann,et al. On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..
[47] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.