Scalable Kernel Methods via Doubly Stochastic Gradients
暂无分享,去创建一个
Le Song | Maria-Florina Balcan | Bo Dai | Yingyu Liang | Anant Raj | Bo Xie | Niao He | Le Song | Bo Xie | Maria-Florina Balcan | Anant Raj | Yingyu Liang | Niao He | Bo Dai | M. Balcan
[1] Gunnar Rätsch,et al. Predicting Time Series with Support Vector Machines , 1997, ICANN.
[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[3] J. Platt. Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .
[4] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .
[5] Alexander J. Smola,et al. Learning with kernels , 1998 .
[6] Bernhard Schölkopf,et al. Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.
[7] Christopher K. I. Williams,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.
[8] Katya Scheinberg,et al. Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..
[9] Bernhard Schölkopf,et al. Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.
[10] Alexander J. Smola,et al. Online learning with kernels , 2001, IEEE Transactions on Signal Processing.
[11] O. Bousquet,et al. Kernels, Associated Structures and Generalizations , 2004 .
[12] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..
[13] L. Bottou,et al. Training Invariant Support Vector Machines using Selective Sampling , 2005 .
[14] S. Sathiya Keerthi,et al. A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..
[15] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[16] Martin J. Wainwright,et al. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.
[17] J. Andrew Bagnell,et al. Kernel Conjugate Gradient for Fast Kernel Machines , 2007, IJCAI.
[18] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[19] Le Song,et al. Relative Novelty Detection , 2009, AISTATS.
[20] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[21] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.
[22] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[23] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[24] Andrew Zisserman,et al. Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[25] Ameet Talwalkar,et al. On the Impact of Kernel Approximation on Learning Accuracy , 2010, AISTATS.
[26] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..
[27] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[28] Harish Karnick,et al. Random Feature Maps for Dot Product Kernels , 2012, AISTATS.
[29] Andreas Ziehe,et al. Learning Invariant Representations of Molecules for Atomization Energy Prediction , 2012, NIPS.
[30] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[31] Nathan Srebro,et al. Learning Optimally Sparse Support Vector Machines , 2013, ICML.
[32] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[33] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[34] Alexander J. Smola,et al. Fastfood - Computing Hilbert Space Expansions in loglinear time , 2013, ICML.
[35] Rasmus Pagh,et al. Fast and scalable polynomial kernels via explicit feature maps , 2013, KDD.
[36] Quanfu Fan,et al. Random Laplace Feature Maps for Semigroup Kernels on Histograms , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[37] Bernhard Schölkopf,et al. Randomized Nonlinear Component Analysis , 2014, ICML.
[38] Le Song,et al. Least Squares Revisited: Scalable Approaches for Multi-class Prediction , 2013, ICML.
[39] Ambedkar Dukkipati,et al. Learning by Stretching Deep Networks , 2014, ICML.
[40] Francis R. Bach,et al. On the Equivalence between Quadrature Rules and Random Features , 2015, ArXiv.
[41] Le Song,et al. Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[42] Tianbao Yang,et al. On Data Preconditioning for Regularized Loss Minimization , 2014, Machine Learning.
[43] Guanghui Lan,et al. Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..
[44] Vikas Sindhwani,et al. Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..
[45] M. Urner. Scattered Data Approximation , 2016 .