A Spiking Neural Network Model of an Actor-Critic Learning Agent
暂无分享,去创建一个
[1] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[2] A P Georgopoulos,et al. On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[4] A. Harry Klopf,et al. A drive-reinforcement model of single neuron function , 1987 .
[5] B. Kosco. Differential Hebbian learning , 1987 .
[6] A. Klopf. A neuronal model of classical conditioning , 1988 .
[7] Daniel J. Amit,et al. Modeling brain function: the world of attractor neural networks, 1st Edition , 1989 .
[8] A. Aertsen,et al. Synaptic plasticity in rat hippocampal slice cultures: local "Hebbian" conjunction of pre- and postsynaptic stimulation leads to distributed synaptic enhancement. , 1989, Proceedings of the National Academy of Sciences of the United States of America.
[9] J. Bolz,et al. Non-Hebbian synapses in rat visual cortex. , 1990, Neuroreport.
[10] P. Dayan. The Convergence of TD(λ) for General λ , 2004, Machine Learning.
[11] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[12] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .
[13] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[14] D. Madison,et al. Locally distributed synaptic potentiation in the hippocampus. , 1994, Science.
[15] P. Dayan,et al. TD(λ) converges with probability 1 , 2004, Machine Learning.
[16] A. Barto,et al. Adaptive Critics and the Basal Ganglia , 1994 .
[17] Joel L. Davis,et al. Adaptive Critics and the Basal Ganglia , 1995 .
[18] Peter Dayan,et al. Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.
[19] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[20] M. Poo,et al. Spread of Synaptic Depression Mediated by Presynaptic Cytoplasmic Signaling , 1996, Science.
[21] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[22] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[23] M. Poo,et al. Propagation of activity-dependent synaptic depression in simple neural networks , 1997, Nature.
[24] D. Johnston,et al. Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997 .
[25] Y. Frégnac,et al. A phenomenological model of visually evoked spike trains in cat geniculate nonlagged X-cells , 1998, Visual Neuroscience.
[26] John S. Denker,et al. Neural Networks for Computing , 1998 .
[27] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[28] Li I. Zhang,et al. A critical window for cooperation and competition among developing retinotectal synapses , 1998, Nature.
[29] G. Bi,et al. Synaptic Modifications in Cultured Hippocampal Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsynaptic Cell Type , 1998, The Journal of Neuroscience.
[30] W. Schultz,et al. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.
[31] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[32] David J. Foster,et al. A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.
[33] Li I. Zhang,et al. Selective Presynaptic Propagation of Long-Term Potentiation in Defined Neural Networks , 2000, The Journal of Neuroscience.
[34] K. Doya. Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.
[35] R. Kempter,et al. Temporal map formation in the barn owl's brain. , 2001, Physical review letters.
[36] Rajesh P. N. Rao,et al. Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning , 2001, Neural Computation.
[37] R. Kempter,et al. Formation of temporal-feature maps by axonal propagation of synaptic learning , 2001, Proceedings of the National Academy of Sciences of the United States of America.
[38] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..
[39] Roland E. Suri,et al. Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.
[40] W. Schultz. Getting Formal with Dopamine and Reward , 2002, Neuron.
[41] Y. Niv,et al. Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors , 2002 .
[42] Kenji Doya,et al. Metalearning and neuromodulation , 2002, Neural Networks.
[43] Eytan Ruppin,et al. Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.
[44] J. Leo van Hemmen,et al. Mapping time , 2002, Biological Cybernetics.
[45] John N. J. Reynolds,et al. Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.
[46] Y. Dan,et al. Spike-timing-dependent synaptic modification induced by natural spike trains , 2002, Nature.
[47] Florentin Wörgötter,et al. Isotropic Sequence Order Learning , 2003, Neural Computation.
[48] Karl J. Friston,et al. Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.
[49] H. Seung,et al. Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.
[50] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[51] E. Kandel,et al. Activity-Dependent Presynaptic Facilitation and Hebbian LTP Are Both Required and Interact during Classical Conditioning in Aplysia , 2003, Neuron.
[52] Xiaohui Xie,et al. Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.
[53] Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.
[54] Patrick D. Roberts,et al. Computational Consequences of Temporally Asymmetric Learning Rules: I. Differential Hebbian Learning , 1999, Journal of Computational Neuroscience.
[55] Patrick D. Roberts,et al. Computational Consequences of Temporally Asymmetric Learning Rules: II. Sensory Image Cancellation , 2000, Journal of Computational Neuroscience.
[56] M. Delgado,et al. Modulation of Caudate Activity by Action Contingency , 2004, Neuron.
[57] Peter Dayan,et al. Temporal difference models describe higher-order learning in humans , 2004, Nature.
[58] Daniel Lehmann,et al. Modeling Compositionality by Dynamic Binding of Synfire Chains , 2004, Journal of Computational Neuroscience.
[59] S. Thorpe,et al. Spike times make sense , 2005, Trends in Neurosciences.
[60] Florentin Wörgötter,et al. Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms , 2005, Neural Computation.
[61] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[62] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[63] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[64] W. Gerstner,et al. Triplets of Spikes in a Model of Spike Timing-Dependent Plasticity , 2006, The Journal of Neuroscience.
[65] R. Dolan,et al. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.
[66] Markus Diesmann,et al. Programmable Logic Construction Kits for Hyper-Real-Time Neuronal Modeling , 2006, Neural Computation.
[67] E. Vaadia,et al. Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.
[68] Stefan Philipp,et al. Interconnecting VLSI Spiking Neural Networks Using Isochronous Connections , 2007, IWANN.
[69] Razvan V. Florian,et al. Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.
[70] R. O’Reilly,et al. Separate neural substrates for skill learning and performance in the ventral and dorsal striatum , 2007, Nature Neuroscience.
[71] E. Izhikevich. Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.
[72] Marc-Oliver Gewaltig,et al. NEST (NEural Simulation Tool) , 2007, Scholarpedia.
[73] Johannes Schemmel,et al. Spike-Frequency Adapting Neural Ensembles: Beyond Mean Adaptation and Renewal Theories , 2007, Neural Computation.
[74] Florentin Wörgötter,et al. Learning with Relevance: Using a Third Factor to Stabilize Hebbian Learning , 2007, Neural Computation.
[75] B. Richmond,et al. Knowing without doing , 2007, Nature Neuroscience.
[76] Ron Meir,et al. Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule , 2007, Neural Computation.
[77] M. Farries,et al. Reinforcement learning with modulated spike timing dependent synaptic plasticity. , 2007, Journal of neurophysiology.
[78] B. Kosko. Differential Hebbian learning , 2008 .
[79] Wulfram Gerstner,et al. Phenomenological models of synaptic plasticity based on spike timing , 2008, Biological Cybernetics.
[80] Yoshua Bengio,et al. Alternative time representation in dopamine models , 2009, Journal of Computational Neuroscience.
[81] Minija Tamosiunaite,et al. On the Asymptotic Equivalence Between Differential Hebbian and Temporal Difference Learning , 2008, Neural Computation.
[82] Hiroyuki Nakahara,et al. Internal-Time Temporal Difference Model for Neural Value-Based Decision Making , 2010, Neural Computation.
[83] Markus Diesmann,et al. Compositionality of arm movements can be realized by propagating synchrony , 2010, Journal of Computational Neuroscience.
[84] Jean-Marc Fellous,et al. Computational models of reinforcement learning: the role of dopamine as a reward signal , 2010, Cognitive Neurodynamics.
[85] Chris Christodoulou,et al. Multiagent Reinforcement Learning: Spiking and Nonspiking Agents in the Iterated Prisoner's Dilemma , 2011, IEEE Transactions on Neural Networks.