Nonparametric Compositional Stochastic Optimization for Risk-Sensitive Kernel Learning
暂无分享,去创建一个
Ketan Rajawat | Alec Koppel | Amrit Singh Bedi | Panchajanya Sanyal | Alec Koppel | A. S. Bedi | K. Rajawat | Panchajanya Sanyal
[1] 中嶋 博. Convex Programming の新しい方法 (開学記念号) , 1966 .
[2] A. Ruszczynski,et al. Optimization of Risk Measures , 2006 .
[3] Shie Mannor,et al. The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.
[4] Jeff G. Schneider,et al. On the Error of Random Fourier Features , 2015, UAI.
[5] Mengdi Wang,et al. Finite-sum Composition Optimization via Variance Reduced Gradient Descent , 2016, AISTATS.
[6] Shabbir Ahmed,et al. Convexity and decomposition of mean-risk stochastic programs , 2006, Math. Program..
[7] Zhu Li,et al. Towards a Unified Analysis of Random Fourier Features , 2018, ICML.
[8] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[9] Y. Ermoliev. Stochastic quasigradient methods and their application to system optimization , 1983 .
[10] Mengdi Wang,et al. Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.
[11] Zoltán Szabó,et al. Optimal Rates for Random Fourier Features , 2015, NIPS.
[12] S. Brendle,et al. Calculus of Variations , 1927, Nature.
[13] Francesco Orabona,et al. Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.
[14] Slobodan Vucetic,et al. Online Passive-Aggressive Algorithms on a Budget , 2010, AISTATS.
[15] Brian M. Sadler,et al. Optimally Compressed Nonparametric Online Learning: Tradeoffs between memory and consistency , 2020, IEEE Signal Processing Magazine.
[16] Gesualdo Scutari,et al. Distributed nonconvex constrained optimization over time-varying digraphs , 2018, Mathematical Programming.
[17] Koby Crammer,et al. Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..
[18] Annette ten Teije,et al. Subseries of Lecture Notes in Computer Science , 2016 .
[19] Recursive Optimization of Convex Risk Measures: Mean-Semideviation Models , 2018, 1804.00636.
[20] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[21] David Ruppert,et al. Semiparametric regression during 2003-2007. , 2009, Electronic journal of statistics.
[22] Antonin Chambolle,et al. On Representer Theorems and Convex Regularization , 2018, SIAM J. Optim..
[23] R. Olfati-Saber,et al. Consensus Filters for Sensor Networks and Distributed Sensor Fusion , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.
[24] A. Ruszczynski,et al. Statistical estimation of composite risk functionals and risk optimization problems , 2015, 1504.02658.
[25] Ketan Rajawat,et al. Controlling the Bias-Variance Tradeoff via Coherent Risk for Robust Learning with Kernels , 2019, 2019 American Control Conference (ACC).
[26] T. Poggio,et al. The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.
[27] C. D. Bailey. Hamilton's principle and the calculus of variations , 1982 .
[28] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .
[29] H. Robbins. A Stochastic Approximation Method , 1951 .
[30] Stan Uryasev,et al. Conditional value-at-risk: optimization algorithms and applications , 2000, Proceedings of the IEEE/IAFE/INFORMS 2000 Conference on Computational Intelligence for Financial Engineering (CIFEr) (Cat. No.00TH8520).
[31] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[32] Gesualdo Scutari,et al. NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.
[33] Le Song,et al. Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.
[34] Alejandro Ribeiro,et al. Nonparametric Stochastic Compositional Gradient Descent for Q-Learning in Continuous Markov Decision Problems , 2018, 2018 Annual American Control Conference (ACC).
[35] Randy A. Freeman,et al. Distributed Cooperative Active Sensing Using Consensus Filters , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.
[36] Na Li,et al. Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).
[37] Yuesheng Xu,et al. Universal Kernels , 2006, J. Mach. Learn. Res..
[38] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[39] Sonia Martínez,et al. Discrete-time dynamic average consensus , 2010, Autom..
[40] Alejandro Ribeiro,et al. Parsimonious Online Learning with Kernels via sparse projections in function space , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Trung Le,et al. Nonparametric Budgeted Stochastic Gradient Descent , 2016, AISTATS.
[42] Peter Stone,et al. Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference , 2017, IEEE Transactions on Automatic Control.
[43] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[44] Angelia Nedic,et al. Distributed stochastic gradient tracking methods , 2018, Mathematical Programming.
[45] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[46] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.
[47] W. Marsden. I and J , 2012 .
[48] Le Song,et al. Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.
[49] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.
[50] Alexander J. Smola,et al. Online learning with kernels , 2001, IEEE Transactions on Signal Processing.
[51] Pascal Vincent,et al. Kernel Matching Pursuit , 2002, Machine Learning.
[52] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[53] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[54] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.
[55] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.
[56] Saeed Ghadimi,et al. A Single Timescale Stochastic Approximation Method for Nested Stochastic Optimization , 2018, SIAM J. Optim..
[57] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.
[58] Andrew Packard,et al. Control Applications of Sum of Squares Programming , 2005 .
[59] A. Zygmund,et al. Measure and integral : an introduction to real analysis , 1977 .
[60] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .
[61] Steven C. H. Hoi,et al. Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning , 2012, ICML.
[62] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.
[63] J. Mark. Introduction to radial basis function networks , 1996 .
[64] R. Durrett. Probability: Theory and Examples , 1993 .
[65] P. Stone,et al. Breaking Bellman's Curse of Dimensionality: Efficient Kernel Gradient Temporal Difference , 2017, 1709.04221.