论文信息 - What they do when in doubt: a study of inductive biases in seq2seq learners - 字舞流文

What they do when in doubt: a study of inductive biases in seq2seq learners

E. Kharitonov | Rahma Chaabouni

[1] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.

[2] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[3] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[4] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[5] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.

[6] Ngoc Thang Vu,et al. Learning the Dyck Language with Attention-based Seq2Seq Models , 2019, BlackboxNLP@ACL.

[7] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[8] Marta R. Costa-jussà,et al. Joint Source-Target Self Attention with Locality Constraints , 2019, ArXiv.

[9] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[10] Jason Weston,et al. Jump to better conclusions: SCAN both left and right , 2018, BlackboxNLP@EMNLP.

[11] Mathijs Mul,et al. Compositionality Decomposed: How do Neural Networks Generalise? , 2019, J. Artif. Intell. Res..

[12] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.

[15] Eran Yahav,et al. On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[16] J. Tenenbaum,et al. The learnability of abstract syntactic principles , 2011, Cognition.

[17] Marco Baroni,et al. CNNs found to jump around more skillfully than RNNs: Compositional Generalization in Seq2seq Convolutional Networks , 2019, ACL.

[18] Peter Grünwald,et al. A tutorial introduction to the minimum description length principle , 2004, ArXiv.

[19] Eugene Kharitonov,et al. Word-order Biases in Deep-agent Emergent Communication , 2019, ACL.

[20] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.

[21] Yann Ollivier,et al. The Description Length of Deep Learning models , 2018, NeurIPS.

[22] Marco Baroni,et al. Human few-shot learning of compositional instructions , 2019, CogSci.

[23] Noah A. Smith,et al. A Formal Hierarchy of RNN Architectures , 2020, ACL.

[24] J. Elman,et al. Rethinking Innateness: A Connectionist Perspective on Development , 1996 .

[25] M. Raijmakers. Rethinking innateness: A connectionist perspective on development. , 1997 .

[26] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[27] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[28] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[29] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[31] Samy Bengio,et al. Identity Crisis: Memorization and Generalization under Extreme Overparameterization , 2019, ICLR.

[32] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[33] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[34] Yonatan Belinkov,et al. LSTM Networks Can Perform Dynamic Counting , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[35] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[36] Di He,et al. Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation , 2018, NeurIPS.

[37] Brenden M. Lake,et al. Learning Inductive Biases with Simple Neural Networks , 2018, CogSci.

[38] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[39] Marco Baroni,et al. Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks , 2018, BlackboxNLP@EMNLP.

[40] David Lopez-Paz,et al. Permutation Equivariant Models for Compositional Generalization in Language , 2020, ICLR.

[41] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[42] Emmanuel Dupoux,et al. Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , 2016, Cognition.

[43] R. Thomas McCoy,et al. Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks , 2020, TACL.

[44] Samuel Ritter,et al. Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.

[45] Noam Chomsky,et al. वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[46] Hava T. Siegelmann,et al. On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[47] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.