Stabilizing Equilibrium Models by Jacobian Regularization
暂无分享,去创建一个
[1] Uri M. Ascher,et al. Improved Bounds on Sample Size for Implicit Matrix Trace Estimators , 2013, Found. Comput. Math..
[2] Yee Whye Teh,et al. Augmented Neural ODEs , 2019, NeurIPS.
[3] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[4] Harris Drucker,et al. Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.
[5] Jiawei Han,et al. Understanding the Difficulty of Training Transformers , 2020, EMNLP.
[6] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[7] Liwei Wang,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[8] Jun Zhu,et al. Implicit Normalizing Flows , 2021, ICLR.
[9] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.
[10] Razvan Pascanu,et al. Relational recurrent neural networks , 2018, NeurIPS.
[11] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.
[12] Frederick Tung,et al. Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.
[13] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[15] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[16] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[17] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .
[19] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[20] J. Zico Kolter,et al. Monotone operator equilibrium networks , 2020, NeurIPS.
[21] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[22] Laurent El Ghaoui,et al. Implicit Deep Learning , 2019, SIAM J. Math. Data Sci..
[23] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] R. Hartley,et al. Deep Declarative Networks: A New Hope , 2019, ArXiv.
[25] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[26] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.
[27] Donald G. M. Anderson. Iterative Procedures for Nonlinear Integral Equations , 1965, JACM.
[28] David P. Woodruff,et al. Hutch++: Optimal Stochastic Trace Estimation , 2020, SOSA.
[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[30] J. Zico Kolter,et al. Estimating Lipschitz constants of monotone deep equilibrium models , 2021, ICLR.
[31] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[34] David Duvenaud,et al. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.
[35] C. G. Broyden. A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .
[36] Yousef Saad,et al. Fast Estimation of tr(f(A)) via Stochastic Lanczos Quadrature , 2017, SIAM J. Matrix Anal. Appl..
[37] H. H. Rachford,et al. The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .
[38] Vladlen Koltun,et al. Deep Equilibrium Models , 2019, NeurIPS.
[39] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[40] Sivan Toledo,et al. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix , 2011, JACM.
[41] Matthew J. Johnson,et al. Learning Differential Equations that are Easy to Solve , 2020, NeurIPS.
[42] Thomas Serre,et al. Stable and expressive recurrent vision models , 2020, NeurIPS.
[43] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[44] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[45] Guillermo Sapiro,et al. Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.
[46] Vladlen Koltun,et al. Multiscale Deep Equilibrium Models , 2020, NeurIPS.
[47] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[48] Kaiming He,et al. Group Normalization , 2018, ECCV.
[49] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[50] Judy Hoffman,et al. Robust Learning with Jacobian Regularization , 2019, ArXiv.
[51] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[52] Hajime Asama,et al. Hypersolvers: Toward Fast Continuous-Depth Models , 2020, NeurIPS.
[53] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[54] R. Mises,et al. Praktische Verfahren der Gleichungsauflösung . , 1929 .