Practical optimization methods for machine learning models
暂无分享,去创建一个
[1] Mark W. Schmidt,et al. Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.
[2] U. Ascher,et al. Adaptive and stochastic algorithms for EIT and DC resistivity problems with piecewise constant solutions and many measurements , 2011 .
[3] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[4] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.
[5] Eric P. Xing,et al. Conditional Topic Random Fields , 2010, ICML.
[6] Andrew McCallum,et al. Information extraction from research papers using conditional random fields , 2006, Inf. Process. Manag..
[7] Zeyuan Allen Zhu,et al. Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.
[8] Mark W. Schmidt,et al. Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions , 2015, UAI.
[9] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[10] Alexander J. Smola,et al. Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.
[11] Shai Shalev-Shwartz,et al. Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.
[12] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[13] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[14] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.
[15] Rong Jin,et al. MixedGrad: An O(1/T) Convergence Rate Algorithm for Stochastic Smooth Optimization , 2013, ArXiv.
[16] Pascal Fua,et al. Kullback-Leibler Proximal Variational Inference , 2015, NIPS.
[17] Matthew D. Hoffman,et al. A trust-region method for stochastic variational inference with applications to streaming data , 2015, ICML.
[18] Phil Blunsom,et al. Semantic Role Labelling with Tree Conditional Random Fields , 2005, CoNLL.
[19] Nuno Vasconcelos,et al. Spatiotemporal Saliency in Dynamic Scenes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[21] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[22] Peter L. Bartlett,et al. Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..
[23] M E J Newman,et al. Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.
[24] Léon Bottou,et al. On the Ineffectiveness of Variance Reduced Optimization for Deep Learning , 2018, NeurIPS.
[25] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[26] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[27] Silvere Bonnabel,et al. Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.
[28] Rong Jin,et al. Linear Convergence with Condition Number Independent Access of Full Gradients , 2013, NIPS.
[29] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.
[30] Suvrit Sra,et al. Matrix Manifold Optimization for Gaussian Mixtures , 2015, NIPS.
[31] Martin J. Wainwright,et al. Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes , 2010, J. Mach. Learn. Res..
[32] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[33] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.
[34] Suvrit Sra,et al. First-order Methods for Geodesically Convex Optimization , 2016, COLT.
[35] Mark W. Schmidt,et al. StopWasting My Gradients: Practical SVRG , 2015, NIPS.
[36] Hiroyuki Kasai,et al. Riemannian stochastic variance reduced gradient on Grassmann manifold , 2016, ArXiv.
[37] Juha Karhunen,et al. Approximate Riemannian Conjugate Gradient Learning for Fixed-Form Variational Bayes , 2010, J. Mach. Learn. Res..
[38] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[39] Mark W. Schmidt,et al. Hybrid Deterministic-Stochastic Methods for Data Fitting , 2011, SIAM J. Sci. Comput..
[40] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[41] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..
[42] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..
[43] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[44] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.
[45] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[46] John Wright,et al. Complete dictionary recovery over the sphere , 2015, 2015 International Conference on Sampling Theory and Applications (SampTA).
[47] Alexander J. Smola,et al. A Generic Approach for Escaping Saddle points , 2017, AISTATS.
[48] W. Ziller. Riemannian Manifolds with Positive Sectional Curvature , 2012, 1210.4102.
[49] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.
[50] D. Bertsekas,et al. Convergen e Rate of In remental Subgradient Algorithms , 2000 .
[51] Tong Zhang,et al. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.
[52] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[53] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[54] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[55] Ben Jeuris,et al. A survey and comparison of contemporary algorithms for computing the matrix geometric mean , 2012 .
[56] S. Rosset,et al. Piecewise linear regularized solution paths , 2007, 0708.2197.
[57] Mark W. Schmidt,et al. MASAGA: A Linearly-Convergent Stochastic First-Order Method for Optimization on Manifolds , 2018, ECML/PKDD.
[58] S. Sathiya Keerthi,et al. A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..
[59] Hiroyuki Kasai,et al. Riemannian stochastic variance reduced gradient , 2016, SIAM J. Optim..
[60] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.
[61] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[62] Taher H. Haveliwala,et al. Adaptive methods for the computation of PageRank , 2004 .
[63] Suvrit Sra,et al. Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.
[64] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .
[65] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[66] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.
[67] Christopher D. Manning,et al. Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.
[68] R. Vershynin,et al. A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.
[69] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[70] Julien Mairal,et al. Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure , 2016, NIPS.
[71] Sophia Ananiadou,et al. Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.
[72] Teng Zhang,et al. Robust Principal Component Analysis by Manifold Optimization , 2017 .
[73] C. Udriste,et al. Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .
[74] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[75] H. Robbins. A Stochastic Approximation Method , 1951 .
[76] Frank Nielsen,et al. Statistical exponential families: A digest with flash cards , 2009, ArXiv.
[77] Ulrich Paquet. On the Convergence of Stochastic Variational Inference in Bayesian Networks , 2014 .
[78] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[79] Thorsten Joachims,et al. KDD-Cup 2004: results and analysis , 2004, SKDD.
[80] Jorge Nocedal,et al. Sample size selection in optimization methods for machine learning , 2012, Math. Program..
[81] Mark W. Schmidt,et al. Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields , 2015, AISTATS.
[82] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[83] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[84] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..
[85] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[86] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[87] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[88] Tim Salimans,et al. Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression , 2012, ArXiv.
[89] Jason D. M. Rennie,et al. Loss Functions for Preference Levels: Regression with Discrete Ordered Labels , 2005 .
[90] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.
[91] Ami Wiesel,et al. Geodesic Convexity and Covariance Estimation , 2012, IEEE Transactions on Signal Processing.
[92] François Yvon,et al. Practical Very Large Scale CRFs , 2010, ACL.
[93] Mark W. Schmidt,et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.
[94] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.
[95] Marc Teboulle,et al. A fast Iterative Shrinkage-Thresholding Algorithm with application to wavelet-based image deblurring , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[96] Jie Liu,et al. Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.
[97] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[98] Andrew McCallum,et al. Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences , 2003 .
[99] Levent Tunçel,et al. Optimization algorithms on matrix manifolds , 2009, Math. Comput..
[100] Xuanjing Huang,et al. A Fast Accurate Two-stage Training Algorithm for L1-regularized CRFs with Heuristic Line Search Strategy , 2011, IJCNLP.
[101] Burr Settles,et al. Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.
[102] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[103] Carl E. Rasmussen,et al. Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..
[104] Felix J. Herrmann,et al. Robust inversion, dimensionality reduction, and randomized sampling , 2012, Math. Program..
[105] Peter Richtárik,et al. Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..
[106] Suvrit Sra,et al. Geometric Optimization in Machine Learning , 2016 .
[107] Wei Xu,et al. Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent , 2011, ArXiv.
[108] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[109] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[110] Hanna M. Wallach,et al. Efficient Training of Conditional Random Fields , 2002 .
[111] Miguel Lázaro-Gredilla,et al. Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.
[112] Sebastian Nowozin,et al. Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..
[113] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[114] Charles Guyon,et al. Robust Principal Component Analysis for Background Subtraction: Systematic Evaluation and Comparative Analysis , 2012 .
[115] Mark W. Schmidt,et al. Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence , 2017 .
[116] Julien Mairal,et al. Optimization with First-Order Surrogate Functions , 2013, ICML.
[117] Peter Carbonetto,et al. New probabilistic inference algorithms that harness the strengths of variational and Monte Carlo methods , 2009 .
[118] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..
[119] Antoine Bordes,et al. Guarantees for Approximate Incremental SVMs , 2010, AISTATS.
[120] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[121] Gordon V. Cormack,et al. Spam Corpus Creation for TREC , 2005, CEAS.