Reinforcement Learning in Neural Networks: A Survey
暂无分享,去创建一个
Vaghei Yasaman | Ghanbari Ahmad | Sayyed Noorani Sayyed Mohammad Reza | Ghanbari Ahmad | Vaghei Yasaman | Sayyed Reza
[1] Jean-Pascal Pfister,et al. Sequence learning with hidden units in spiking neural networks , 2011, NIPS.
[2] Chi-Sing Leung. Optimum learning for bidirectional associative memory in the sense of capacity , 1994 .
[3] Shalabh Bhatnagar,et al. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..
[4] Jennie Si,et al. Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence) , 2004 .
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] Mu-Chun Su,et al. Neural-network-based fuzzy model and its application to transient stability prediction in power systems , 1999, IEEE Trans. Syst. Man Cybern. Part C.
[7] Gábor Balázs,et al. Cascade-Correlation Neural Networks : A Survey , 2010 .
[8] Xin Zhang,et al. Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.
[9] B. Bakker,et al. Reinforcement learning by backpropagation through an LSTM model/critic , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[10] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[12] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[13] Paul J. Werbos,et al. 2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .
[14] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.
[15] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[16] Jyh-Shing Roger Jang,et al. ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..
[17] Samir Kouro,et al. Unidimensional Modulation Technique for Cascaded Multilevel Converters , 2009, IEEE Transactions on Industrial Electronics.
[18] Warren B. Powell,et al. Reinforcement Learning and Its Relationship to Supervised Learning , 2004 .
[19] Raúl Rojas,et al. Neural Networks - A Systematic Introduction , 1996 .
[20] Shuzhi Sam Ge,et al. Robust adaptive control of uncertain force/motion constrained nonholonomic mobile manipulators , 2008, Autom..
[21] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[22] Sander M. Bohte,et al. Error-backpropagation in temporally encoded networks of spiking neurons , 2000, Neurocomputing.
[23] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.
[24] C. Christodoulou,et al. Spiking neural networks with different reinforcement learning (RL) schemes in a multiagent setting. , 2010, The Chinese journal of physiology.
[25] Mark Ring. Two methods for hierarchy learning in reinforcement environments , 1993 .
[26] Jürgen Schmidhuber,et al. Training Recurrent Networks by Evolino , 2007, Neural Computation.
[27] Madan Gopal,et al. A REINFORCEMENT LEARNING ALGORITHM WITH EVOLVING FUZZY NEURAL NETWORKS , 2014 .
[28] André da Motta Salles Barreto,et al. Reinforcement Learning using Kernel-Based Stochastic Factorization , 2011, NIPS.
[29] Ruya Samli. STOCHASTIC NEURAL NETWORKS AND THEIR SOLUTIONS TO OPTIMISATION PROBLEMS , 2012 .
[30] Zeng-ou Wang. A Bidirectional Associative Memory Based on Optimal Linear Associative Memory , 1996, IEEE Trans. Computers.
[31] Shaocheng Tong,et al. A DSC Approach to Robust Adaptive NN Tracking Control for Strict-Feedback Nonlinear Systems , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[32] Hamid R. Berenji,et al. A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters , 2003, IEEE Trans. Fuzzy Syst..
[33] V. Borkar. Stochastic approximation with two time scales , 1997 .
[34] Shuzhi Sam Ge,et al. Adaptive Robust Output-Feedback Motion/Force Control of Electrically Driven Nonholonomic Mobile Manipulators , 2007, IEEE Transactions on Control Systems Technology.
[35] Shuzhi Sam Ge,et al. Adaptive tracking control of uncertain MIMO nonlinear systems with input constraints , 2011, Autom..
[36] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[37] Mu-Chun Su. Identification of singleton fuzzy models via fuzzy hyperrectangular composite NN , 1997 .
[38] Andres El-Fakdi,et al. Semi-online neural-Q/spl I.bar/leaming for real-time robot learning , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).
[39] Frank L. Lewis,et al. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.
[40] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[41] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[42] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[43] Gianluca Baldassarre,et al. A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours , 2002, Cognitive Systems Research.
[44] P. Lanzi,et al. Adaptive Agents with Reinforcement Learning and Internal Memory , 2000 .
[45] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[46] Jose B. Cruz,et al. Two coding strategies for bidirectional associative memory , 1990, IEEE Trans. Neural Networks.
[47] Zidong Wang,et al. Exponential stability of delayed recurrent neural networks with Markovian jumping parameters , 2006 .
[48] Loredana Zollo,et al. Hierarchical reinforcement learning and central pattern generators for modeling the development of rhythmic manipulation skills , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).
[49] Wulfram Gerstner,et al. Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons , 2013, PLoS Comput. Biol..
[50] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[51] Li Tang,et al. Adaptive neural network control of robot manipulator using reinforcement learning , 2014 .
[52] Jennie Si,et al. Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.
[53] Joy Bose,et al. An associative memory for the on-line recognition and prediction of temporal sequences , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..
[54] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[55] Leslie Pack Kaelbling,et al. Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.
[56] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[57] Li Li,et al. Neuro-Fuzzy Dynamic-Inversion-Based Adaptive Control for Robotic Manipulators—Discrete Time Case , 2007, IEEE Transactions on Industrial Electronics.
[58] Fuchun Sun,et al. Stable neural-network-based adaptive control for sampled-data nonlinear systems , 1998, IEEE Trans. Neural Networks.
[59] Shigeo Abe,et al. A reinforcement learning algorithm for neural networks with incremental learning ability , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..
[60] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[61] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[62] Anne Nagel. Neural Networks And Fuzzy Systems A Dynamical Systems Approach To Machine Intelligence , 2016 .
[63] Jerry M. Mendel,et al. Back-propagation fuzzy system as nonlinear dynamic system identifiers , 1992, [1992 Proceedings] IEEE International Conference on Fuzzy Systems.
[64] Xinghui Zhang,et al. Sensitivity to noise in bidirectional associative memory (BAM) , 2005, IEEE Transactions on Neural Networks.
[65] Kurt Binder,et al. Monte Carlo Simulation in Statistical Physics , 1992, Graduate Texts in Physics.
[66] Jose B. Cruz,et al. Encoding strategy for maximum noise tolerance bidirectional associative memory , 2005, IEEE Transactions on Neural Networks.
[67] Halbert White,et al. Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.
[68] Yasuo Kuniyoshi,et al. Robust central pattern generators for embodied hierarchical reinforcement learning , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).
[69] M. Georgiopoulos,et al. Feed-forward neural networks , 1994, IEEE Potentials.
[70] Warren E. Dixon,et al. Asymptotic tracking by a reinforcement learning-based adaptive critic controller , 2011 .
[71] Jürgen Schmidhuber,et al. Optimal Ordered Problem Solver , 2002, Machine Learning.
[72] Michael Aichinger,et al. Monte Carlo Simulation , 2013 .
[73] Mahmood Amiri,et al. BAM Learning of Nonlinearly Separable Tasks by Using an Asymmetrical Output Function and Reinforcement Learning , 2009, IEEE Transactions on Neural Networks.
[74] Xue Jinlin,et al. Neurofuzzy velocity tracking control with reinforcement learning , 2009, 2009 9th International Conference on Electronic Measurement & Instruments.
[75] Z. Ibrahim,et al. Mobile phone customers churn prediction using elman and Jordan Recurrent Neural Network , 2012, 2012 7th International Conference on Computing and Convergence Technology (ICCCT).
[76] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[77] R.J. Williams,et al. Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.
[78] Andrzej J. Kasinski,et al. Supervised Learning in Spiking Neural Networks with ReSuMe: Sequence Learning, Classification, and Spike Shifting , 2010, Neural Computation.
[79] Dan Simon,et al. Computational Modeling and Simulation of Intellect: Current State and Future Perspectives , 2011 .
[80] Bart Kosko,et al. Neural networks and fuzzy systems: a dynamical systems approach to machine intelligence , 1991 .
[81] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[82] Alexandros Giagkos,et al. From Animals to Animats 14 , 2016, Lecture Notes in Computer Science.
[83] Ila R Fiete,et al. Gradient learning in spiking neural networks by dynamic perturbation of conductances. , 2006, Physical review letters.
[84] Sungchul Kang,et al. Impedance Learning for Robotic Contact Tasks Using Natural Actor-Critic Algorithm , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[85] G. Rizzolatti,et al. The Organization of the Frontal Motor Cortex. , 2000, News in physiological sciences : an international journal of physiology produced jointly by the International Union of Physiological Sciences and the American Physiological Society.
[86] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[87] Domenico Parisi,et al. A Bioinspired Hierarchical Reinforcement Learning Architecture for Modeling Learning of Multiple Skills with Continuous States and Actions , 2010, EpiRob.
[88] Alin Albu-Schäffer,et al. Human-Like Adaptation of Force and Impedance in Stable and Unstable Interactions , 2011, IEEE Transactions on Robotics.
[89] Frank L. Lewis,et al. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..
[90] Markus Diesmann,et al. A Spiking Neural Network Model of an Actor-Critic Learning Agent , 2009, Neural Computation.
[91] Razvan V. Florian,et al. Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.
[92] Razvan V. Florian. A reinforcement learning algorithm for spiking neural networks , 2005, Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05).
[93] Jagannathan Sarangapani,et al. Neural Network Control of Nonlinear Discrete-Time Systems , 2018 .
[94] André Grüning,et al. Elman Backpropagation as Reinforcement for Simple Recurrent Networks , 2007, Neural Computation.
[95] Chin-Teng Lin,et al. Neural-Network-Based Fuzzy Logic Control and Decision System , 1991, IEEE Trans. Computers.
[96] BART KOSKO,et al. Bidirectional associative memories , 1988, IEEE Trans. Syst. Man Cybern..
[97] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[98] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .
[99] Pawel Wawrzynski,et al. Learning population of spiking neural networks with perturbation of conductances , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).
[100] Shin Ishii,et al. Reinforcement learning for a biped robot based on a CPG-actor-critic method , 2007, Neural Networks.
[101] Mark B. Ring. Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.
[102] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[103] Zoran Miljkovic,et al. Neural network Reinforcement Learning for visual control of robot manipulators , 2013, Expert Syst. Appl..
[104] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[105] Andrew McCallum,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[106] John K. Williams,et al. Reinforcement Learning of Optimal Controls , 2009 .
[107] Ann Maria Bell,et al. Reinforcement Learning Rules in a Repeated Game , 2001 .
[108] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.
[109] Mounir Boukadoum,et al. A bidirectional heteroassociative memory for binary and grey-level patterns , 2006, IEEE Transactions on Neural Networks.
[110] Lyle Noakes,et al. Continuous-Time Adaptive Critics , 2007, IEEE Transactions on Neural Networks.
[111] Frank L. Lewis,et al. Adaptive dynamic programming applied to a 6DoF quadrotor , 2011 .
[112] Igor Farkas,et al. Grounding the Meanings in Sensorimotor Behavior using Reinforcement Learning , 2012, Front. Neurorobot..
[113] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[114] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[115] Farid U. Dowla,et al. Backpropagation Learning for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method , 1991, Int. J. Neural Syst..
[116] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .