A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters
暂无分享,去创建一个
[1] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[3] Michio Sugeno,et al. Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[4] M. Sugeno,et al. Structure identification of fuzzy model , 1988 .
[5] B. Kosko. Fuzzy systems as universal approximators , 1992, [1992 Proceedings] IEEE International Conference on Fuzzy Systems.
[6] Hamid R. Berenji,et al. Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.
[7] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[8] Hamid R. Berenji,et al. A reinforcement learning--based architecture for fuzzy logic control , 1992, Int. J. Approx. Reason..
[9] L. Wang,et al. Fuzzy systems are universal approximators , 1992, [1992 Proceedings] IEEE International Conference on Fuzzy Systems.
[10] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[11] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[12] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[14] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[15] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[16] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[17] David Tse,et al. Power control and capacity of spread spectrum wireless networks , 1999, Autom..
[18] H.R. Berenji,et al. Cooperation and coordination between fuzzy reinforcement learning agents in continuous state partially observable Markov decision processes , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).
[19] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[20] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[21] D. Vengerov,et al. An Empirical Model of Factor Adjustment Dynamics , 2006 .
[22] Nicholas Bambos,et al. Power controlled multiple access (PCMA) in wireless communication networks , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).
[23] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.
[24] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[25] Ron Sun,et al. From implicit skills to explicit knowledge: a bottom-up model of skill learning , 2001, Cogn. Sci..