The Loss Surface of XOR Artificial Neural Networks
暂无分享,去创建一个
Edgar A. Bernal | Dhagash Mehta | Xiaojun Zhao | David J. Wales | D. Mehta | D. Wales | Xiaojun Zhao
[1] Tamiki Komatsuzaki,et al. How many dimensions are required to approximate the potential energy landscape of a model protein? , 2005, The Journal of chemical physics.
[2] David J Wales,et al. Potential energy and free energy landscapes. , 2006, The journal of physical chemistry. B.
[3] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.
[4] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[5] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[6] M. A. Virasoro,et al. Barriers and metastable states as saddle points in the replica approach , 1993 .
[7] Robert Hecht-Nielsen,et al. On the Geometry of Feedforward Neural Network Error Surfaces , 1993, Neural Computation.
[8] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[9] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[10] Kewei Tu,et al. Mapping the Energy Landscape of Non-convex Optimization Problems , 2014, EMMCVPR.
[11] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[12] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[13] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[14] H. Scheraga,et al. Global optimization of clusters, crystals, and biomolecules. , 1999, Science.
[15] Leonard G. C. Hamey,et al. XOR has no local minima: A case study in neural network error surface analysis , 1998, Neural Networks.
[16] J. Doye,et al. Thermodynamics and the Global Optimization of Lennard-Jones clusters , 1998, cond-mat/9806020.
[17] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[18] Héctor J. Sussmann,et al. Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.
[19] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[20] Ohad Shamir,et al. On the Quality of the Initial Basin in Overspecified Neural Networks , 2015, ICML.
[21] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[22] Dhagash Mehta,et al. Finding all the stationary points of a potential-energy landscape via numerical polynomial-homotopy-continuation method. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.
[23] Michael Page,et al. On evaluating the reaction path Hamiltonian , 1988 .
[24] Ida G. Sprinkhuizen-Kuyper,et al. The Error Surface of the Simplest XOR Network Has Only Global Minima , 1996, Neural Computation.
[25] F. Noé,et al. Transition networks for modeling the kinetics of conformational change in macromolecules. , 2008, Current opinion in structural biology.
[26] David J. Wales,et al. Some further applications of discrete path sampling to cluster isomerization , 2004 .
[27] Mark A. Miller,et al. Archetypal energy landscapes , 1998, Nature.
[28] Peter Auer,et al. Exponentially many local minima for single neurons , 1995, NIPS.
[29] J. Doye,et al. THE DOUBLE-FUNNEL ENERGY LANDSCAPE OF THE 38-ATOM LENNARD-JONES CLUSTER , 1998, cond-mat/9808265.
[30] A. Cavagna,et al. Spin-glass theory for pedestrians , 2005, cond-mat/0505032.
[31] Ida G. Sprinkhuizen-Kuyper,et al. The error surface of the 2-2-1 XOR network: The finite stationary points , 1998, Neural Networks.
[32] Antonio Auffinger,et al. Complexity of random smooth functions on the high-dimensional sphere , 2011, 1110.5872.
[33] David J. Wales,et al. Exploring biomolecular energy landscapes. , 2017, Chemical communications.
[34] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[35] D. Mehta,et al. Energy landscape of the finite-size mean-field 2-spin spherical model and topology trivialization. , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.
[36] David J Wales,et al. Finding pathways between distant local minima. , 2005, The Journal of chemical physics.
[37] Marcus Gallagher,et al. Multi-layer Perceptron Error Surfaces: Visualization, Structure and Modelling , 2000 .
[38] Xiao-Hu Yu,et al. Can backpropagation error surface not have local minima , 1992, IEEE Trans. Neural Networks.
[39] K. Laidler,et al. Symmetries of activated complexes , 1968 .
[40] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[41] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[42] D. Mehta,et al. Potential energy landscape of the two-dimensional XY model: higher-index stationary points. , 2014, The Journal of chemical physics.
[43] G. Henkelman,et al. Improved tangent estimate in the nudged elastic band method for finding minimum energy paths and saddle points , 2000 .
[44] A. Crisanti,et al. The sphericalp-spin interaction spin glass model: the statics , 1992 .
[45] David J Wales,et al. Energy landscapes: some new horizons. , 2010, Current opinion in structural biology.
[46] Razvan Pascanu,et al. Local minima in training of neural networks , 2016, 1611.06310.
[47] Vineeth N. Balasubramanian,et al. Are Saddles Good Enough for Deep Learning? , 2017, ArXiv.
[48] Yann LeCun,et al. Explorations on high dimensional landscapes , 2014, ICLR.
[49] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[50] F. Rao,et al. The protein folding network. , 2004, Journal of molecular biology.
[51] Graeme Henkelman,et al. Unification of algorithms for minimum mode optimization. , 2014, The Journal of chemical physics.
[52] David J. Wales,et al. Transition states and rearrangement mechanisms from hybrid eigenvector-following and density functional theory. Application to C10H10 and defect migration in crystalline silicon , 2001 .
[53] Raúl Rojas,et al. Neural Networks - A Systematic Introduction , 1996 .
[54] Eunhyeok Park,et al. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.
[55] Jonathan P. K. Doye,et al. Stationary points and dynamics in high-dimensional systems , 2003 .
[56] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[57] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.
[58] David J Wales,et al. Energy landscapes for a machine learning application to series data. , 2016, The Journal of chemical physics.
[59] Ida G. Sprinkhuizen-Kuyper,et al. A local minimum for the 2-3-1 XOR network , 1999, IEEE Trans. Neural Networks.
[60] Jonathan Tompson,et al. Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[62] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[63] P. L. Doussal,et al. Topology Trivialization and Large Deviations for the Minimum in the Simplest Random Optimization , 2013, 1304.0024.
[64] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[65] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[66] J. Doye,et al. Saddle Points and Dynamics of Lennard-Jones Clusters, Solids and Supercooled Liquids , 2001, cond-mat/0108310.
[67] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .
[68] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[69] Russell Reed,et al. Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.
[70] D. Wales. A Microscopic Basis for the Global Appearance of Energy Landscapes , 2001, Science.
[71] Lindsey J. Munro,et al. DEFECT MIGRATION IN CRYSTALLINE SILICON , 1999 .
[72] Virginia L. Stonick,et al. 488 Solutions to the XOR Problem , 1996, NIPS.
[73] Y. Fyodorov. High-Dimensional Random Fields and Random Matrix Theory , 2013, 1307.2379.
[74] R. Fletcher,et al. A New Approach to Variable Metric Algorithms , 1970, Comput. J..
[75] Roberto Cipolla,et al. Symmetry-invariant optimization in deep networks , 2015, ArXiv.
[76] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[77] Stefano Soatto,et al. Trivializing The Energy Landscape Of Deep Networks , 2015, ArXiv.
[78] Dhagash Mehta,et al. Stationary point analysis of the one-dimensional lattice Landau gauge fixing functional, aka random phase XY Hamiltonian , 2010, 1010.5335.
[79] J. Doye,et al. Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms , 1997, cond-mat/9803344.
[80] David J Wales,et al. Thermodynamics and kinetics of aggregation for the GNNQQNY peptide. , 2007, Journal of the American Chemical Society.
[81] David J. Wales,et al. Free energy landscapes of model peptides and proteins , 2003 .
[82] René Vidal,et al. Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.
[83] D. Shanno. Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .
[84] C. Lee Giles,et al. What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation , 1998 .
[85] D. Wales,et al. A doubly nudged elastic band method for finding transition states. , 2004, The Journal of chemical physics.
[86] D. Mehta,et al. Energy-landscape analysis of the two-dimensional nearest-neighbor φ⁴ model. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.
[87] Max Tegmark,et al. Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.
[88] Dhagash Mehta,et al. Phase transitions detached from stationary points of the energy landscape. , 2011, Physical review letters.
[89] Dhagash Mehta,et al. Energy landscape of the finite-size spherical three-spin glass model. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.
[90] H. Scheraga,et al. Monte Carlo-minimization approach to the multiple-minima problem in protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.
[91] Diego Prada-Gracia,et al. Exploring the Free Energy Landscape: From Dynamics to Networks and Back , 2009, PLoS Comput. Biol..
[92] M. Karplus,et al. The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics , 1997 .
[93] D. Goldfarb. A family of variable-metric methods derived by variational means , 1970 .
[94] Ida G. Sprinkhuizen-Kuyper,et al. The local minima of the error surface of the 2-2-1 XOR network , 2004, Annals of Mathematics and Artificial Intelligence.
[95] G. Henkelman,et al. A climbing image nudged elastic band method for finding saddle points and minimum energy paths , 2000 .
[96] Antonio Auffinger,et al. Random Matrices and Complexity of Spin Glasses , 2010, 1003.1129.
[97] Dhagash Mehta,et al. Statistics of stationary points of random finite polynomial potentials , 2015, 1504.02786.
[98] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.
[99] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[100] D. Wales. Discrete path sampling , 2002 .
[101] David J Wales,et al. New results for phase transitions from catastrophe theory. , 2004, The Journal of chemical physics.
[102] G. Henkelman,et al. A dimer method for finding saddle points on high dimensional potential surfaces using only first derivatives , 1999 .
[103] Sergei V. Krivov,et al. Free energy disconnectivity graphs: Application to peptide models , 2002 .