暂无分享,去创建一个
Ken-ichi Kawarabayashi | Stefanie Jegelka | Simon S. Du | Keyulu Xu | Jingling Li | Mozhi Zhang | S. Du | Keyulu Xu | S. Jegelka | K. Kawarabayashi | Jingling Li | Mozhi Zhang
[1] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[2] Atsushi Nitanda,et al. Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime , 2021, ICLR.
[3] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[4] E. Fama,et al. Common risk factors in the returns on stocks and bonds , 1993 .
[5] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[6] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[7] Tatsunori B. Hashimoto,et al. Distributionally Robust Neural Networks , 2020, ICLR.
[8] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Bernhard Schölkopf,et al. Domain Generalization via Invariant Feature Representation , 2013, ICML.
[10] Vera Kurková,et al. Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.
[11] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[12] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[13] Francis Bach,et al. A Note on Lazy Training in Supervised Differentiable Programming , 2018, ArXiv.
[14] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[15] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.
[16] Melvyn Sim,et al. Distributionally Robust Optimization and Its Tractable Approximations , 2010, Oper. Res..
[17] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[18] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[19] Jordan Boyd-Graber,et al. Interactive Refinement of Cross-Lingual Word Embeddings , 2020, EMNLP.
[20] Tao Xiang,et al. Domain Generalization with MixStyle , 2021, ICLR.
[21] Raia Hadsell,et al. Graph networks as learnable physics engines for inference and control , 2018, ICML.
[22] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[23] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[24] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[25] Razvan Pascanu,et al. Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.
[26] Jure Leskovec,et al. Strategies for Pre-training Graph Neural Networks , 2020, ICLR.
[27] S. Levine,et al. Reasoning About Physical Interactions with Object-Centric Models , 2018 .
[28] Dawn Song,et al. Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.
[29] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.
[30] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[31] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.
[32] Chris Dyer,et al. Neural Arithmetic Logic Units , 2018, NeurIPS.
[33] Michael I. Jordan,et al. Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.
[34] Stefanie Jegelka,et al. Distributionally Robust Optimization and Generalization in Kernel Methods , 2019, NeurIPS.
[35] Amit Dhurandhar,et al. Empirical or Invariant Risk Minimization? A Sample Complexity Perspective , 2020, ArXiv.
[36] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[37] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[38] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[39] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[40] Pushmeet Kohli,et al. Analysing Mathematical Reasoning Abilities of Neural Models , 2019, ICLR.
[41] Andrea Montanari,et al. Linearized two-layers neural networks in high dimension , 2019, The Annals of Statistics.
[42] Raman Arora,et al. Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..
[43] Ken-ichi Kawarabayashi,et al. Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.
[44] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[45] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[46] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[47] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[48] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.
[49] Samuel S. Schoenholz,et al. Neural Message Passing for Quantum Chemistry , 2017, ICML.
[50] Zachary Dulberg,et al. Learning Representations that Support Extrapolation , 2020, ICML.
[51] Guillaume Lample,et al. Deep Learning for Symbolic Mathematics , 2019, ICLR.
[52] Taiji Suzuki,et al. Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint , 2020, ICLR.
[53] Jaehoon Lee,et al. Neural Tangents: Fast and Easy Infinite Neural Networks in Python , 2019, ICLR.
[54] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[55] Chuang Gan,et al. The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.
[56] Matthias Hein,et al. Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Joan Bruna,et al. Gradient Dynamics of Shallow Univariate ReLU Networks , 2019, NeurIPS.
[58] A. Lapedes,et al. Nonlinear signal processing using neural networks: Prediction and system modelling , 1987 .
[59] Felix Hill,et al. Measuring abstract reasoning in neural networks , 2018, ICML.
[60] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[61] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[62] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.
[63] Razvan Pascanu,et al. Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.
[64] Nathan Srebro,et al. How do infinite width bounded norm networks look in function space? , 2019, COLT.
[65] Ruosong Wang,et al. Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels , 2019, NeurIPS.
[66] José M. F. Moura,et al. Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.
[67] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.
[68] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[69] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[70] René Alquézar,et al. Improvement of Learning in Recurrent Networks by Substituting the Sigmoid Activation Function , 1994 .
[71] Jonas Peters,et al. Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.
[72] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
[73] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[74] S. Ross. The arbitrage theory of capital asset pricing , 1976 .
[75] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[76] David Rolnick,et al. Complexity of Linear Regions in Deep Networks , 2019, ICML.
[77] Ruosong Wang,et al. Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks , 2019, ICLR.
[78] Alexander Rosenberg Johansen,et al. Neural Arithmetic Units , 2020, ICLR.
[79] Bernhard Schölkopf,et al. Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..
[80] Julien Mairal,et al. On the Inductive Bias of Neural Tangent Kernels , 2019, NeurIPS.
[81] Koby Crammer,et al. Learning Bounds for Domain Adaptation , 2007, NIPS.
[82] L.F.A. Wessels,et al. Extrapolation and interpolation in neural network classifiers , 1992, IEEE Control Systems.
[83] Kun Zhang,et al. On Learning Invariant Representation for Domain Adaptation , 2019, ArXiv.
[84] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[85] P. J. Haley,et al. Extrapolation limitations of multilayer feedforward neural networks , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[86] R. Banz,et al. The relationship between return and market value of common stocks , 1981 .
[87] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.
[88] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[89] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[90] Ken-ichi Kawarabayashi,et al. What Can Neural Networks Reason About? , 2019, ICLR.
[91] Pietro Liò,et al. Principal Neighbourhood Aggregation for Graph Nets , 2020, NeurIPS.
[92] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[93] T. Lindvall. ON A ROUTING PROBLEM , 2004, Probability in the Engineering and Informational Sciences.
[94] Pradeep Ravikumar,et al. The Risks of Invariant Risk Minimization , 2020, ICLR.
[95] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[96] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.
[97] Yishay Mansour,et al. Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.
[98] D. B. McCaughan. On the properties of periodic perceptrons , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).
[99] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[100] Han Zhao,et al. On Learning Invariant Representations for Domain Adaptation , 2019, ICML.
[101] R. Bellman. Dynamic programming. , 1957, Science.
[102] Raia Hadsell,et al. Neural Execution of Graph Algorithms , 2020, ICLR.
[103] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[104] Ken-ichi Kawarabayashi,et al. Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization , 2019, ACL.
[105] Sylvain Gelly,et al. Gradient Descent Quantizes ReLU Network Features , 2018, ArXiv.
[106] David Lopez-Paz,et al. Invariant Risk Minimization , 2019, ArXiv.