暂无分享,去创建一个
Stefano Soatto | Yann LeCun | Christian Borgs | Jennifer T. Chayes | Carlo Baldassi | Riccardo Zecchina | Anna Choromanska | Levent Sagun | Pratik Chaudhari | Yann LeCun | P. Chaudhari | Stefano Soatto | C. Borgs | J. Chayes | A. Choromańska | R. Zecchina | Levent Sagun | Carlo Baldassi
[1] References , 1971 .
[2] E. Allgower,et al. Introduction to Numerical Continuation Methods , 1987 .
[3] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[4] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[5] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[6] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[7] Monasson,et al. Weight space structure and internal representations: A direct approach to learning and generalization in multilayer neural networks. , 1995, Physical review letters.
[8] R.Monasson,et al. Weight Space Structure and Internal Representations: a Direct Approach to Learning and Generalization in Multilayer Neural Network , 1995, cond-mat/9501082.
[9] Monasson,et al. Analytical and numerical study of internal representations in multilayer neural networks with binary weights. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[10] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[11] D. Haussler,et al. MUTUAL INFORMATION, METRIC ENTROPY AND CUMULATIVE RELATIVE ENTROPY RISK , 1997 .
[12] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[13] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[14] G. Roberts,et al. Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .
[15] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[16] Larry Wasserman,et al. All of Statistics: A Concise Course in Statistical Inference , 2004 .
[17] Riccardo Zecchina,et al. Survey propagation: An algorithm for satisfiability , 2002, Random Struct. Algorithms.
[18] Martin J. Wainwright,et al. A new look at survey propagation and its generalizations , 2004, SODA '05.
[19] Federico Ricci-Tersenghi,et al. On the solution-space geometry of random constraint satisfaction problems , 2006, STOC '06.
[20] Yan V Fyodorov,et al. Replica Symmetry Breaking Condition Exposed by Random Matrix Calculation of Landscape Complexity , 2007, cond-mat/0702601.
[21] A. Bray,et al. Statistics of critical points of Gaussian fields on large-dimensional spaces. , 2006, Physical review letters.
[22] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[23] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[24] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.
[25] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[26] Florent Krzakala,et al. Statistical physics-based reconstruction in compressed sensing , 2011, ArXiv.
[27] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[28] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[29] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[30] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[31] Ryan Babbush,et al. Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.
[32] Hossein Mobahi,et al. On the Link between Gaussian Homotopy Continuation and Convex Envelopes , 2015, EMMCVPR.
[33] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[34] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[35] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.
[36] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[37] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[38] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.
[39] René Vidal,et al. Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.
[40] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[41] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[42] Tianqi Chen,et al. A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.
[43] Carlo Baldassi,et al. Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses. , 2015, Physical review letters.
[44] Vivek Rathod,et al. Bayesian dark knowledge , 2015, NIPS.
[45] Yann LeCun,et al. Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.
[46] Carlo Baldassi,et al. Local entropy as a measure for sampling solutions in Constraint Satisfaction Problems , 2015 .
[47] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[48] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[49] Stefano Soatto,et al. On the energy landscape of deep networks , 2015, 1511.06485.
[50] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[51] Masashi Sugiyama,et al. Bayesian Dark Knowledge , 2015 .
[52] P. Chaudhari,et al. The Effect of Gradient Noise on the Energy Landscape of Deep Networks , 2015 .
[53] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[54] A. Bovier. Metastability: A Potential-Theoretic Approach , 2016 .
[55] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[56] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[57] Christian Borgs,et al. Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes , 2016, Proceedings of the National Academy of Sciences.
[58] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[59] Hossein Mobahi,et al. Training Recurrent Neural Networks by Diffusion , 2016, ArXiv.
[60] David M. Blei,et al. A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.
[61] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[62] Yann LeCun,et al. Singularity of the Hessian in Deep Learning , 2016, ArXiv.
[63] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.
[64] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.
[65] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[66] Zhe Gan,et al. Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization , 2015, AISTATS.
[67] Shai Shalev-Shwartz,et al. On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.
[68] Carlo Baldassi,et al. Learning may need only a few bits of synaptic precision. , 2016, Physical review. E.
[69] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[70] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[71] Anima Anandkumar,et al. Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods , 2017 .
[72] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[73] Aaron C. Courville,et al. Recurrent Batch Normalization , 2016, ICLR.
[74] Yoshua Bengio,et al. Mollifying Networks , 2016, ICLR.
[75] Zhe Gan,et al. Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling , 2016, ACL.
[76] C. Papadimitriou,et al. Introduction to the Theory of Computation , 2018 .
[77] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .