暂无分享,去创建一个
Yoshua Bengio | Bernhard Scholkopf | Muhammad Waleed Gondal | Peter Gehler | Nasim Rahaman | Yoshua Bengio | Francesco Locatello | Shruti Joshi | B. Schölkopf | Francesco Locatello | B. Scholkopf | Nasim Rahaman | Peter Gehler | S. Joshi | B. Scholkopf
[1] David Barber,et al. Modular Networks: Learning to Decompose Neural Computation , 2018, NeurIPS.
[2] Yoshua Bengio,et al. Inductive Biases for Deep Learning of Higher-Level Cognition , 2020, ArXiv.
[3] Chuang Gan,et al. CLEVRER: CoLlision Events for Video REpresentation and Reasoning , 2020, ICLR.
[4] Marco Baroni,et al. Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks , 2018, BlackboxNLP@EMNLP.
[5] Yoshua Bengio,et al. Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.
[6] Aaron C. Courville,et al. Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.
[7] Bernhard Schölkopf,et al. Recurrent Independent Mechanisms , 2021, ICLR.
[8] Bernhard Schölkopf,et al. Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .
[9] MarchandMario,et al. Domain-adversarial training of neural networks , 2016 .
[10] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.
[11] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[12] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[14] Alex Lamb,et al. Deep Learning for Classical Japanese Literature , 2018, ArXiv.
[15] Charles Blundell,et al. Neural Production Systems , 2021, ArXiv.
[16] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[17] Yifan Hu,et al. Repulsive Attention: Rethinking Multi-head Attention as Bayesian Inference , 2020, EMNLP.
[18] Martin Jaggi,et al. On the Relationship between Self-Attention and Convolutional Layers , 2019, ICLR.
[19] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[20] Bernhard Schölkopf,et al. A theory of independent mechanisms for extrapolation in generative models , 2020, AAAI.
[21] Simon L. Kendal,et al. An introduction to knowledge engineering , 2007 .
[22] Xiao Wang,et al. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.
[23] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[24] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[25] Saining Xie,et al. An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.
[27] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[28] Jimmy Ba,et al. The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning , 2020, ArXiv.
[29] Bernhard Schölkopf,et al. Learning Independent Causal Mechanisms , 2017, ICML.
[30] Thomas L. Griffiths,et al. Automatically Composing Representation Transformations as a Means for Generalization , 2018, ICLR.
[31] Matthew Riemer,et al. Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.
[32] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.
[33] Tim Verbelen,et al. Improving Generalization for Abstract Reasoning Tasks Using Disentangled Feature Representations , 2018, NIPS 2018.
[34] Bernhard Schölkopf,et al. On causal and anticausal learning , 2012, ICML.
[35] Ignacio Cases,et al. Routing Networks and the Challenges of Modular and Compositional Computation , 2019, ArXiv.
[36] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[37] Yoshua Bengio,et al. Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems , 2020, ArXiv.
[38] Chuang Gan,et al. The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.
[39] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[40] Yoshua Bengio,et al. On the Spectral Bias of Neural Networks , 2018, ICML.
[41] Marco Baroni,et al. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.
[42] Yoshua Bengio,et al. Transformers with Competitive Ensembles of Independent Mechanisms , 2021, ArXiv.
[43] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[44] Noah D. Goodman,et al. Neural Event Semantics for Grounded Language Understanding , 2021, Transactions of the Association for Computational Linguistics.
[45] Victor Lempitsky,et al. Image Generators with Conditionally-Independent Pixel Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Felix Hill,et al. Measuring abstract reasoning in neural networks , 2018, ICML.
[47] Christopher Joseph Pal,et al. A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.
[48] Shlomo Zilberstein,et al. Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..
[49] Pietro Liò,et al. Abstract Diagrammatic Reasoning with Multiplex Graph Networks , 2020, ICLR.
[50] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[51] Georg Heigold,et al. Object-Centric Learning with Slot Attention , 2020, NeurIPS.
[52] Yoshua Bengio,et al. Towards Causal Representation Learning , 2021, ArXiv.
[53] J. Raven,et al. Manual for Raven's progressive matrices and vocabulary scales , 1962 .
[54] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[55] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[56] Marc'Aurelio Ranzato,et al. Few-shot Sequence Learning with Transformers , 2020, ArXiv.
[57] Jürgen Schmidhuber,et al. Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.
[58] Yoram Singer,et al. Shampoo: Preconditioned Stochastic Tensor Optimization , 2018, ICML.