Distributed Policy Evaluation Under Multiple Behavior Strategies
暂无分享,去创建一个
Ali H. Sayed | Santiago Zazo | Sergio Valcarcel Macua | Jianshu Chen | Jianshu Chen | S. Zazo | A. H. Sayed
[1] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[2] Ali H. Sayed,et al. Performance Limits for Distributed Estimation Over LMS Adaptive Networks , 2012, IEEE Transactions on Signal Processing.
[3] Byron Boots,et al. Predictive State Temporal Difference Learning , 2010, NIPS.
[4] Richard S. Sutton,et al. Temporal-difference search in computer Go , 2012, Machine Learning.
[5] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[6] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[7] Ali H. Sayed,et al. Distributed Pareto Optimization via Diffusion Strategies , 2012, IEEE Journal of Selected Topics in Signal Processing.
[8] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[9] Soummya Kar,et al. Convergence Rate Analysis of Distributed Gossip (Linear Parameter) Estimation: Fundamental Limits and Tradeoffs , 2010, IEEE Journal of Selected Topics in Signal Processing.
[10] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[11] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[12] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[13] Ali H. Sayed,et al. Diffusion Least-Mean Squares Over Adaptive Networks: Formulation and Performance Analysis , 2008, IEEE Transactions on Signal Processing.
[14] Ali H. Sayed,et al. Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks , 2011, IEEE Transactions on Signal Processing.
[15] J.N. Tsitsiklis,et al. Convergence in Multiagent Coordination, Consensus, and Flocking , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.
[16] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..
[17] E. Seneta. Non-negative Matrices and Markov Chains , 2008 .
[18] T Y Al Naffouri,et al. TRANSIENT ANALYSIS OF DATANORMALIZED ADAPTIVE FILTERS , 2003 .
[19] Santiago Zazo,et al. Diffusion gradient temporal difference for cooperative reinforcement learning with linear function approximation , 2012, 2012 3rd International Workshop on Cognitive Information Processing (CIP).
[20] O. Nelles,et al. An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.
[21] Sridhar Mahadevan,et al. Learning Representation and Control in Markov Decision Processes: New Frontiers , 2009, Found. Trends Mach. Learn..
[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[23] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[24] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[25] Thomas Degris,et al. Scaling-up Knowledge for a Cognizant Robot , 2012, AAAI Spring Symposium: Designing Intelligent Robots.
[26] Ali H. Sayed,et al. Diffusion LMS Strategies for Distributed Estimation , 2010, IEEE Transactions on Signal Processing.
[27] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[28] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[29] Dirk P. Kroese,et al. Handbook of Monte Carlo Methods , 2011 .
[30] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[31] Andrew W. Moore,et al. Distributed Value Functions , 1999, ICML.
[32] Ali H. Sayed,et al. Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior , 2013, IEEE Signal Processing Magazine.
[33] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[34] Ali H. Sayed,et al. Cooperative off-policy prediction of Markov decision processes in adaptive networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[35] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[36] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[37] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[38] Gene H. Golub,et al. Numerical solution of saddle point problems , 2005, Acta Numerica.
[39] Shalabh Bhatnagar,et al. The Borkar-Meyn theorem for asynchronous stochastic approximations , 2011, Syst. Control. Lett..
[40] B. V. Dean,et al. Studies in Linear and Non-Linear Programming. , 1959 .
[41] Richard M. Murray,et al. Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.
[42] Leslie Pack Kaelbling,et al. Efficient Distributed Reinforcement Learning through Agreement , 2008, DARS.
[43] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[44] Ali H. Sayed,et al. On the limiting behavior of distributed optimization strategies , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[45] Matthieu Geist,et al. Parametric value function approximation: A unified view , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[46] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[47] Ali H. Sayed,et al. Asynchronous Adaptation and Learning Over Networks—Part II: Performance Analysis , 2013, IEEE Transactions on Signal Processing.
[48] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.
[49] Srdjan S. Stankovic,et al. Decentralized Parameter Estimation by Consensus Based Stochastic Approximation , 2007, IEEE Transactions on Automatic Control.
[50] V. Climenhaga. Markov chains and mixing times , 2013 .
[51] Tareq Y. Al-Naffouri,et al. Transient analysis of data-normalized adaptive filters , 2003, IEEE Trans. Signal Process..
[52] Ali H. Sayed,et al. On the Learning Behavior of Adaptive Networks—Part I: Transient Analysis , 2013, IEEE Transactions on Information Theory.
[53] Ali H. Sayed,et al. Adaptive Networks , 2014, Proceedings of the IEEE.
[54] B. V. Dean,et al. Studies in Linear and Non-Linear Programming. , 1959 .
[55] Dimitri P. Bertsekas,et al. Basis function adaptation methods for cost approximation in MDP , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[56] Brian D. Ripley,et al. Stochastic Simulation , 2005 .
[57] B. V. Dean,et al. Studies in Linear and Non-Linear Programming. , 1959 .
[58] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.
[59] Ali H. Sayed,et al. Diffusion Adaptation over Networks , 2012, ArXiv.
[60] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[61] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .
[62] Marc G. Bellemare,et al. Sketch-Based Linear Value Function Approximation , 2012, NIPS.
[63] H. Vincent Poor,et al. QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations , 2012, IEEE Trans. Signal Process..
[64] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[65] Ali H. Sayed,et al. Diffusion Strategies Outperform Consensus Strategies for Distributed Estimation Over Adaptive Networks , 2012, IEEE Transactions on Signal Processing.
[66] S. Haykin. Adaptive Filters , 2007 .
[67] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[68] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..