暂无分享,去创建一个
[1] R. Courant,et al. What Is Mathematics , 1943 .
[2] R. F.,et al. Mathematical Statistics , 1944, Nature.
[3] K. Chung. On a Stochastic Approximation Method , 1954 .
[4] R. E. Kalman,et al. A New Approach to Linear Filtering and Prediction Problems , 2002 .
[5] S. Friedman. On Stochastic Approximations , 1963 .
[6] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[7] Albert B Novikoff,et al. ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .
[8] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[9] E. G. Gladyshev. On Stochastic Approximation , 1965 .
[10] Shun-ichi Amari,et al. A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..
[11] Shun-ichi Amari,et al. A Theory of Pattern Recognition , 1968 .
[12] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .
[13] R. Fletcher,et al. A New Approach to Variable Metric Algorithms , 1970, Comput. J..
[14] D. Shanno. Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .
[15] D. Goldfarb. A family of variable-metric methods derived by variational means , 1970 .
[16] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.
[17] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[18] M. J. D. Powell,et al. On search directions for minimization algorithms , 1973, Math. Program..
[19] J. J. Moré,et al. A Characterization of Superlinear Convergence and its Application to Quasi-Newton Methods , 1973 .
[20] R. Glowinski,et al. Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .
[21] B. Mercier,et al. A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .
[22] Larry Nazareth,et al. A family of variable metric updates , 1977, Math. Program..
[23] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[24] J. Nocedal. Updating Quasi-Newton Matrices With Limited Storage , 1980 .
[25] Louis B. Rall,et al. Automatic differentiation , 1981 .
[26] R. Dembo,et al. INEXACT NEWTON METHODS , 1982 .
[27] T. Steihaug. The Conjugate Gradient Method and Trust Regions in Large Scale Optimization , 1983 .
[28] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[29] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[30] New York Dover,et al. ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .
[31] Gene H. Golub,et al. Matrix computations , 1983 .
[32] John E. Dennis,et al. Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.
[33] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .
[34] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[35] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[36] Henryk Wozniakowski,et al. Information-based complexity , 1987, Nature.
[37] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[38] Griewank,et al. On automatic differentiation , 1988 .
[39] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[40] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[41] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[42] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.
[43] David S. Touretzky,et al. Advances in neural information processing systems 2 , 1989 .
[44] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[45] Shirley Dex,et al. JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .
[46] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .
[47] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[48] M. R. Osborne. Fisher's Method of Scoring , 1992 .
[49] Dimitri P. Bertsekas,et al. On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..
[50] Todd K. Leen,et al. Optimal Stochastic Search and Adaptive Momentum , 1993, NIPS.
[51] R. D. Murphy,et al. Iterative solution of nonlinear equations , 1994 .
[52] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[53] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[54] David L. Donoho,et al. De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.
[55] Dimitri P. Bertsekas,et al. Incremental Least Squares Methods and the Extended Kalman Filter , 1996, SIAM J. Optim..
[56] T. Cover. Universal Portfolios , 1996 .
[57] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[58] Jorge Nocedal,et al. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.
[59] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.
[60] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[61] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[62] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.
[63] Jean-Francois Cardoso,et al. Blind signal separation: statistical principles , 1998, Proc. IEEE.
[64] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[65] Peter L. Bartlett,et al. The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.
[66] Alexander Shapiro,et al. A simulation-based approach to two-stage stochastic programming with recourse , 1998, Math. Program..
[67] Noboru Murata,et al. A Statistical Study on On-line Learning , 1999 .
[68] R. Dudley,et al. Uniform Central Limit Theorems: Notation Index , 2014 .
[69] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[70] Chih-Jen Lin,et al. Newton's Method for Large Bound-Constrained Optimization Problems , 1999, SIAM J. Optim..
[71] Kenji Fukumizu,et al. Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.
[72] Shun-ichi Amari,et al. Methods of information geometry , 2000 .
[73] P. Massart. Some applications of concentration inequalities to statistics , 2000 .
[74] Nicholas I. M. Gould,et al. Trust Region Methods , 2000, MOS-SIAM Series on Optimization.
[75] E. Berger. UNIFORM CENTRAL LIMIT THEOREMS (Cambridge Studies in Advanced Mathematics 63) By R. M. D UDLEY : 436pp., £55.00, ISBN 0-521-46102-2 (Cambridge University Press, 1999). , 2001 .
[76] Nicol N. Schraudolph. Fast Curvature Matrix-Vector Products , 2001, ICANN.
[77] G. Shafer,et al. Probability and Finance: It's Only a Game! , 2001 .
[78] Brian D. Fisher,et al. University of British Columbia , 2002, INTR.
[79] O. Bousquet. Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .
[80] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.
[81] Michael C. Fu,et al. Optimization for Simulation: Theory vs. Practice , 2002 .
[82] I. Daubechies,et al. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.
[83] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .
[84] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[85] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..
[86] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[87] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[88] Ronald,et al. Learning representations by backpropagating errors , 2004 .
[89] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[90] Léon Bottou,et al. On-line learning for very large data sets , 2005 .
[91] V. Vapnik. Estimation of Dependences Based on Empirical Data , 2006 .
[92] David Gleicher. A Statistical Study , 2006 .
[93] H. Parthasarathy,et al. NemaFootPrinter: a web based software for the identification of conserved non-coding genome sequence regions between C. elegans and C. briggsae , 1981, Nature Immunology.
[94] Alfred O. Hero,et al. A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..
[95] H. Robbins. A Stochastic Approximation Method , 1951 .
[96] Mário A. T. Figueiredo,et al. Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.
[97] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[98] Jianfeng Gao,et al. Scalable training of L1-regularized log-linear models , 2007, ICML '07.
[99] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[100] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[101] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..
[102] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[103] Stephen J. Wright,et al. Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..
[104] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..
[105] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[106] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[107] Paul Tseng,et al. A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..
[108] Adrian S. Lewis,et al. Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..
[109] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[110] Patrick Gallinari,et al. Erratum: SGDQN is Less Careful than Expected , 2010, J. Mach. Learn. Res..
[111] Mark W. Schmidt,et al. Graphical model structure learning using L₁-regularization , 2010 .
[112] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[113] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.
[114] Emmanuel J. Candès,et al. Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..
[115] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[116] Jorge Nocedal,et al. On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..
[117] Pradeep Ravikumar,et al. Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.
[118] Wei Xu,et al. Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent , 2011, ArXiv.
[119] F. Bach,et al. Optimization with Sparsity-Inducing Penalties (Foundations and Trends(R) in Machine Learning) , 2011 .
[120] Léon Bottou,et al. Batch and online learning algorithms for nonconvex neyman-pearson classification , 2011, TIST.
[121] Maxim Raginsky,et al. Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.
[122] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..
[123] Mark W. Schmidt,et al. Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.
[124] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[125] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..
[126] Julien Mairal,et al. Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..
[127] Shay B. Cohen,et al. Advances in Neural Information Processing Systems 25 , 2012, NIPS 2012.
[128] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[129] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[130] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[131] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[132] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[133] Mark W. Schmidt,et al. Hybrid Deterministic-Stochastic Methods for Data Fitting , 2011, SIAM J. Sci. Comput..
[134] Jorge Nocedal,et al. Sample size selection in optimization methods for machine learning , 2012, Math. Program..
[135] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.
[136] J. Nocedal,et al. An inexact successive quadratic approximation method for L-1 regularized optimization , 2013, Mathematical Programming.
[137] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[138] John Langford,et al. Normalized Online Learning , 2013, UAI.
[139] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[140] Pradeep Ravikumar,et al. BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables , 2013, NIPS.
[141] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[142] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..
[143] Stephen J. Wright,et al. Optimization for Machine Learning , 2013 .
[144] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[145] Haipeng Luo,et al. Accelerated Parallel Optimization Methods for Large Scale Machine Learning , 2014, ArXiv.
[146] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[147] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[148] Raghu Pasupathy,et al. On adaptive sampling rules for stochastic recursions , 2014, Proceedings of the Winter Simulation Conference 2014.
[149] É. Moulines,et al. On stochastic proximal gradient algorithms , 2014 .
[150] Samy Bengio,et al. Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.
[151] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[152] Aryan Mokhtari,et al. RES: Regularized Stochastic BFGS Algorithm , 2014, IEEE Transactions on Signal Processing.
[153] Harm de Vries,et al. RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .
[154] Yoshua Bengio,et al. Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.
[155] Stephen J. Wright,et al. An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..
[156] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[157] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[158] Mark W. Schmidt,et al. Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.
[159] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[160] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[161] A. Ozdaglar,et al. Convergence Rate of Incremental Gradient and Newton Methods , 2015 .
[162] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[163] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[164] Dimitri P. Bertsekas,et al. Convex Optimization Algorithms , 2015 .
[165] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[166] P. Glynn,et al. ON SAMPLING RATES IN STOCHASTIC RECURSIONS , 2016 .
[167] Jorge Nocedal,et al. A family of second-order methods for convex $$\ell _1$$ℓ1-regularized optimization , 2016, Math. Program..
[168] Zeyuan Allen Zhu,et al. Optimal Black-Box Reductions Between Optimization Objectives , 2016, NIPS.
[169] Michael W. Mahoney,et al. Sub-Sampled Newton Methods II: Local Convergence Rates , 2016, ArXiv.
[170] Katya Scheinberg,et al. Practical inexact proximal quasi-Newton method with global complexity analysis , 2013, Mathematical Programming.
[171] Yann Ollivier,et al. Practical Riemannian Neural Networks , 2016, ArXiv.
[172] Naman Agarwal,et al. Second Order Stochastic Optimization in Linear Time , 2016, ArXiv.
[173] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[174] Katya Scheinberg,et al. Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning , 2017, ArXiv.
[175] Asuman E. Ozdaglar,et al. On the Convergence Rate of Incremental Aggregated Gradient Algorithms , 2015, SIAM J. Optim..
[176] Gersende Fort,et al. On Perturbed Proximal Gradient Algorithms , 2014, J. Mach. Learn. Res..
[177] Martin J. Wainwright,et al. Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence , 2015, SIAM J. Optim..
[178] Naman Agarwal,et al. Second-Order Stochastic Optimization for Machine Learning in Linear Time , 2016, J. Mach. Learn. Res..
[179] Friedrich Haslinger. ANNALES DE LA FACULTÉ DES SCIENCES DE TOULOUSE , 2019 .
[180] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[181] F. F. Soulié. Experiments with Time Delay Networks and Dynamic Time Warping for speaker independent isolated digits recognition , 2022 .