Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition
暂无分享,去创建一个
Furong Huang | Yang Yuan | Rong Ge | Chi Jin | Yang Yuan | Chi Jin | Furong Huang | Rong Ge | Furong Huang
[1] O. Mangasarian. PSEUDO-CONVEX FUNCTIONS , 1965 .
[2] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[3] Richard A. Harshman,et al. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .
[4] Mihalis Yannakakis,et al. How easy is local search? , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).
[5] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[6] Jean-Francois Cardoso,et al. Source separation using higher order moments , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[7] Saad,et al. On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[8] Alan M. Frieze,et al. Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.
[9] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.
[10] Magnus Rattray,et al. Natural gradient descent for on-line learning , 1998 .
[11] M. A. Hanson. Invexity and the Kuhn–Tucker Theorem☆ , 1999 .
[12] Aapo Hyvärinen. Fast ICA for noisy data using Gaussian moments , 1999, ISCAS.
[13] Krzysztof C. Kiwiel,et al. Convergence and efficiency of subgradient methods for quasiconvex minimization , 2001, Math. Program..
[14] Tamara G. Kolda,et al. Orthogonal Tensor Decompositions , 2000, SIAM J. Matrix Anal. Appl..
[15] Hyeyoung Park,et al. On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units : Steepest Gradient Descent and Natural Gradient Descent , 2002, cond-mat/0212006.
[16] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[17] P. Comon,et al. Tensor decompositions, alternating least squares and other tales , 2009 .
[18] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.
[19] Martin J. Wainwright,et al. Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.
[20] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.
[21] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[22] Anima Anandkumar,et al. Fast Detection of Overlapping Communities via Online Tensor Methods on GPUs , 2013, ArXiv.
[23] Ryan P. Adams,et al. Contrastive Learning Using Spectral Methods , 2013, NIPS.
[24] Prateek Jain,et al. Low-rank matrix completion using alternating minimization , 2012, STOC '13.
[25] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[26] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[27] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[28] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[29] Sanjeev Arora,et al. Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders , 2012, Algorithmica.
[30] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.