暂无分享,去创建一个
Bo Liu | Ji Liu | Sridhar Mahadevan | Philip S. Thomas | Ian Gemp | Stephen Giguere | William Dabney | Nicholas Jacek | P. Thomas | Will Dabney | S. Mahadevan | Ji Liu | Bo Liu | S. Giguere | Nicholas Jacek | I. Gemp
[1] B. McCarl,et al. Economics , 1870, The Indian medical gazette.
[2] H. H. Rachford,et al. On the numerical solution of heat conduction problems in two and three space variables , 1956 .
[3] J. Moreau. Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .
[4] G. Stampacchia,et al. On some non-linear elliptic differential-functional equations , 1966 .
[5] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .
[6] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[7] G. M. Korpelevich. The extragradient method for finding saddle points and other problems , 1976 .
[8] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[9] Stella Dafermos,et al. Traffic Equilibrium and Variational Inequalities , 1980 .
[10] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[11] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[12] Katta G. Murty,et al. Linear complementarity, linear and nonlinear programming , 1988 .
[13] E. Khobotov. Modification of the extra-gradient method for solving variational inequalities and certain optimization problems , 1989 .
[14] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[15] A. Nagurney. Migration equilibrium and variational inequalities. , 1989, Economics letters.
[16] P. Marcotte. APPLICATION OF KHOBOTOVS ALGORITHM TO VARIATIONAL INEQUALITIES ANT) NETWORK EQUILIBRIUM PROBLEMS , 1991 .
[17] A. Nagurney. Network Economics: A Variational Inequality Approach , 1992 .
[18] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[19] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[20] A. Nagurney,et al. Projected Dynamical Systems and Variational Inequalities with Applications , 1995 .
[21] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[22] Karl Johan Åström,et al. PID Controllers: Theory, Design, and Tuning , 1995 .
[23] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[24] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[25] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[26] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[27] A. Iusem,et al. A variant of korpelevich’s method for variational inequalities with a new search strategy , 1997 .
[28] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[29] Doina Precup,et al. Exponentiated Gradient Methods for Reinforcement Learning , 1997, ICML.
[30] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[31] Shun-ichi Amari,et al. Why natural gradient? , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[32] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .
[33] Claudio Gentile,et al. The Robustness of the p-Norm Algorithms , 1999, COLT '99.
[34] M. Solodov,et al. A New Projection Method for Variational Inequality Problems , 1999 .
[35] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[36] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.
[37] Jennie Si,et al. Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.
[38] Arkadi Nemirovski,et al. The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography , 2001, SIAM J. Optim..
[39] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[40] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[41] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..
[42] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..
[43] William H. Press,et al. Numerical recipes in C , 2002 .
[44] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[45] Neil Munro,et al. Fast calculation of stabilizing PID controllers , 2003, Autom..
[46] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[47] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[48] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[49] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[50] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..
[51] Manfred K. Warmuth,et al. On the Worst-Case Analysis of Temporal-Difference Learning Algorithms , 2005, Machine Learning.
[52] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[53] Sridhar Mahadevan,et al. Representation Policy Iteration , 2005, UAI.
[54] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..
[55] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[56] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[57] Arkadi Nemirovski,et al. Non-euclidean restricted memory level method for large-scale convex optimization , 2005, Math. Program..
[58] Ari Arapostathis,et al. Control of Markov chains with safety bounds , 2005, IEEE Transactions on Automation Science and Engineering.
[59] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[60] M. Nowak. Evolutionary Dynamics: Exploring the Equations of Life , 2006 .
[61] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[62] Ben Taskar,et al. Structured Prediction, Dual Extragradient and Bregman Projections , 2006, J. Mach. Learn. Res..
[63] Yurii Nesterov,et al. Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..
[64] Y. Nesterov. Gradient methods for minimizing composite objective function , 2007 .
[65] Bogert Aj. A Proportional Derivative FES Controller for Planar Arm Movement , 2007 .
[66] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[67] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[68] C. Lynch,et al. Functional Electrical Stimulation , 2017, IEEE Control Systems.
[69] Jian-Wen Peng,et al. A NEW HYBRID-EXTRAGRADIENT METHOD FOR GENERALIZED MIXED EQUILIBRIUM PROBLEMS, FIXED POINT PROBLEMS AND VARIATIONAL INEQUALITY PROBLEMS , 2008 .
[70] Dimitri P. Bertsekas,et al. New error bounds for approximations from projected linear equations , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.
[71] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[72] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.
[73] John Langford,et al. Sparse Online Learning via Truncated Gradient , 2008, NIPS.
[74] Antonie J. van den Bogert,et al. A Real-Time, 3-D Musculoskeletal Model for Dynamic Simulation of Arm Movements , 2009, IEEE Transactions on Biomedical Engineering.
[75] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..
[76] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[77] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[78] Sridhar Mahadevan,et al. Learning Representation and Control in Markov Decision Processes: New Frontiers , 2009, Found. Trends Mach. Learn..
[79] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[80] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.
[81] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[82] Robert F. Kirsch,et al. Combined feedforward and feedback control of a redundant, nonlinear, dynamic musculoskeletal system , 2009, Medical & Biological Engineering & Computing.
[83] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[84] D. Bertsekas. Projected Equations, Variational Inequalities, and Temporal Difference Methods , 2009 .
[85] Philip S. Thomas,et al. Application of the Actor-Critic Architecture to Functional Electrical Stimulation Control of a Human Arm , 2009, IAAI.
[86] Angelia Nedic,et al. Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.
[87] Yoram Singer,et al. Efficient Learning using Forward-Backward Splitting , 2009, NIPS.
[88] Scott Kuindersma,et al. Dexterous mobility with the uBot-5 mobile manipulator , 2009, 2009 International Conference on Advanced Robotics.
[89] Haesun Park,et al. Fast Active-set-type Algorithms for L1-regularized Linear Regression , 2010, AISTATS.
[90] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.
[91] Tony F. Chan,et al. A General Framework for a Class of First Order Primal-Dual Algorithms for Convex Optimization in Imaging Science , 2010, SIAM J. Imaging Sci..
[92] Tim Roughgarden,et al. Algorithmic Game Theory , 2007 .
[93] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[94] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[95] Antonin Chambolle,et al. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.
[96] A. Juditsky,et al. 5 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , I : General Purpose Methods , 2010 .
[97] José M. Bioucas-Dias,et al. Alternating direction algorithms for constrained sparse regression: Application to hyperspectral unmixing , 2010, 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing.
[98] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[99] Roderic A. Grupen,et al. Whole-body strategies for mobility and manipulation , 2010 .
[100] U. Rieder,et al. Markov Decision Processes , 2010 .
[101] E. David,et al. Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .
[102] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.
[103] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[104] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[105] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[106] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[107] H. Brendan McMahan,et al. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.
[108] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[109] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.
[110] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[111] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[112] Matthieu Geist,et al. A Dantzig Selector Approach to Temporal Difference Learning , 2012, ICML.
[113] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.
[114] Tobias Scheffer,et al. Static prediction games for adversarial learning problems , 2012, J. Mach. Learn. Res..
[115] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[116] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[117] Nuno C. Martins,et al. Control Design for Markov Chains under Safety Constraints: A Convex Approach , 2012, ArXiv.
[118] Patrick M. Pilarski,et al. Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).
[119] Andrew G. Barto,et al. Motor primitive discovery , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[120] Chris Arney,et al. Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.
[121] Geoffrey J. Gordon. Galerkin Methods for Complementarity Problems and Variational Inequalities , 2013, ArXiv.
[122] R. Washington. A Voted Regularized Dual Averaging Method for Large-Scale Discriminative Training in Natural Language Processing , 2013 .
[123] Yunmei Chen,et al. Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..
[124] Zhiwei Qin,et al. Sparse Reinforcement Learning via Convex Optimization , 2014, ICML.
[125] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[126] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[127] Sébastien Bubeck,et al. Theory of Convex Optimization for Machine Learning , 2014, ArXiv.
[128] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[129] André da Motta Salles Barreto,et al. Practical Kernel-Based Reinforcement Learning , 2014, J. Mach. Learn. Res..