Adaptation , 1926 .
 G. Finch,et al. Higher Order Conditioning with Constant Motivation , 1934 .
 D. Thistlethwaite. A critical review of latent learning and related experiments. , 1951, Psychological bulletin.
 Some aspects of the sequential design of experiments , 1952 .
 James L Olds,et al. Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.
 W. A. Clark,et al. Simulation of self-organizing systems by digital computer , 1954, Trans. IRE Prof. Group Inf. Theory.
 B. G. Farley,et al. Generalization of pattern recognition in a self-organizing system , 1955, AFIPS '55 (Western).
 M. D. Egger,et al. Secondary reinforcement in rats as a function of information value and reliability of the stimulus. , 1962, Journal of experimental psychology.
 Edward O. Thorp,et al. Beat the Dealer: A Winning Strategy for the Game of Twenty-One , 1965 .
 K. Fu,et al. A heuristic approach to reinforcement learning control systems , 1965 .
 Lawrence J. Fogel,et al. Artificial Intelligence through Simulated Evolution , 1966 .
 L. Kamin. Predictability, surprise, attention, and conditioning , 1967 .
 L. Kamin. Attention-like processes in classical conditioning , 1967 .
 D. Shepard. A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.
 A. L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
 A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
 Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..
 M. L. Tsetlin,et al. Automaton theory and modeling of biological systems , 1973 .
 Arnold K. Griffith. A Comparison and Evaluation of Three Machine Learning Procedures as Applied to the Game of Checkers , 1974, Artif. Intell..
 E Harth,et al. Alopex: a stochastic method for determining visual receptive fields. , 1974, Vision research.
 A. Harry Klopf,et al. A comparison of natural and artificial intelligence , 1975, SGAR.
 S. Grossberg. A neural model of attention, reinforcement and discrimination learning. , 1975, International review of neurobiology.
 Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.
 Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
 Carl V. Page,et al. Heuristics for Signature Table Analysis as a Pattern Recognition Technique , 1977, IEEE Transactions on Systems, Man, and Cybernetics.
 Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.
 M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
 J. Pearce,et al. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.
 Christopher D. Adams,et al. Instrumental Responding following Reinforcer Devaluation , 1981 .
 A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
 Richard S. Sutton,et al. Goal Seeking Components for Adaptive Intelligence: An Initial Assessment. , 1981 .
 Lashon B. Booker,et al. Intelligent behavior as an adaptation to the task environment ; Part II. , 1982 .
 R. Sutton,et al. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.
 Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .
 W. Levy,et al. Temporal contiguity requirements for long-term associative potentiation/depression in the hippocampus , 1983, Neuroscience.
 John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
 Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
 Dimitri P. Bertsekas,et al. Distributed asynchronous computation of fixed points , 1983, Math. Program..
 Kumpati S. Narendra,et al. An N-player sequential stochastic game with identical payoffs , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
 E. Kandel,et al. Is there a cell-biological alphabet for simple forms of learning? , 1984 .
 Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
 Judea Pearl,et al. Heuristics - intelligent search strategies for computer problem solving , 1984, Addison-Wesley series in artificial intelligence.
 Mark Derthick,et al. Variations on the Boltzmann Machine Learning Algorithm , 1984 .
 A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
 M. A. L. THATHACHAR,et al. A new approach to the design of reinforcement schemes for learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
 Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
 P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
 Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
 Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
 Richard E. Korf,et al. A Unified Theory of Heuristic Evaluation Functions and its Application to Learning , 1986, AAAI.
 R. Sutton,et al. Simulation of the classically conditioned nictitating membrane response by a neuron-like adaptive element: Response topography, neuronal firing, and interstimulus intervals , 1986, Behavioural Brain Research.
 Andrew G. Barto,et al. Game-theoretic cooperativity in networks of self-interested units , 1987 .
 Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
 Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
 Ronald L. Rivest,et al. Diversity-Based Inference of Finite Automata (Extended Abstract) , 1987, FOCS.
 Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
 M. J. D. Powell,et al. Radial basis functions for multivariable interpolation: a review , 1987 .
 D. J. White,et al. Further Real Applications of Markov Decision Processes , 1988 .
 PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
 D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
 David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
 Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.
 David S. Broomhead,et al. Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..
 R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.
 Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
 Stephen Grossberg,et al. Neural dynamics of adaptive timing and temporal discrimination during associative learning , 1989, Neural Networks.
 John N. Tsitsiklis,et al. Parallel and Distributed Computation: Numerical Methods , 1989 .
 C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.
 George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
 Ming Zhang,et al. Comparisons of channel assignment strategies in cellular mobile telephone systems , 1989, IEEE International Conference on Communications, World Prosperity Through Communications,.
 W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.
 Paul E. Utgoff,et al. Explaining Temporal Differences to Create Useful Concepts for Evaluating States , 1990, AAAI.
 Lyle H. Ungar,et al. A bioreactor benchmark for adaptive network-based process control , 1990 .
 T Poggio,et al. Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.
 W. Schultz,et al. Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. , 1990, Journal of neurophysiology.
 R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.
 Tomaso A. Poggio,et al. Extensions of a Theory of Networks for Approximation and Learning , 1989, NIPS.
 W. Schultz,et al. Dopamine neurons of the monkey midbrain: contingencies of responses to stimuli eliciting immediate behavioral reactions. , 1990, Journal of neurophysiology.
 Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
 Thomas Dean,et al. Toward learning time-varying functions with high input dimensionality , 1990, Proceedings. 5th IEEE International Symposium on Intelligent Control 1990.
 Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.
 Steven Minton,et al. Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..
 Bruce Abramson,et al. Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..
 Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .
 Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
 J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .
 I. Gormezano,et al. Second-order conditioning of the rabbit’s nictitating membrane response , 1991, Integrative physiological and behavioral science : the official journal of the Pavlovian Society.
 D.A. Handelman,et al. Theory and development of higher-order CMAC neural networks , 1992, IEEE Control Systems.
 Terrence J. Sejnowski,et al. Using Aperiodic Reinforcement for Directed Self-Organization During Development , 1992, NIPS.
 W. Schultz,et al. Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.
 Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
 P Dayan,et al. Expectation learning in the brain using diffuse ascending projections , 1992 .
 Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
 Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
 Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
 Satinder P. Singh. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
 Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
 Andrew G. Barto,et al. Shaping as a method for accelerating reinforcement learning , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.
 C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning with Less Data and Less Real Time , 1993 .
 John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.
 Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.
 W. Schultz,et al. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.
 Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
 Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
 D. J. White,et al. A Survey of Applications of Markov Decision Processes , 1993 .
 Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
 Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
 Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
 Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
 Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
 Michael I. Jordan,et al. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms , 1993, Neural Computation.
 Mark W. Spong,et al. Swinging up the Acrobot: an example of intelligent control , 1994, Proceedings of 1994 American Control Conference - ACC '94.
 Jude W. Shavlik,et al. Incorporating Advice into Agents that Learn from Reinforcements , 1994, AAAI.
 Judea Pearl,et al. Counterfactual Probabilities: Computational Methods, Bounds and Applications , 1994, UAI.
 K. P. Unnikrishnan,et al. Alopex: A Correlation-Based Learning Algorithm for Feedforward and Recurrent Neural Networks , 1994, Neural Computation.
 Terrence J. Sejnowski,et al. A Novel Reinforcement Model of Birdsong Vocalization Learning , 1994, NIPS.
 Chen-Khong Tham,et al. Modular on-line function approximation for scaling up reinforcement learning , 1994 .
 Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..
 Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
 S. Schaal,et al. Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.
 Karl J. Friston,et al. Value-dependent selection in the brain: Simulation in a synthetic neural model , 1994, Neuroscience.
 Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
 T. Sejnowski,et al. The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms. , 1994, Learning & memory.
 Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
 Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
 Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
 Jerry M. Mendel,et al. Reinforcement-learning control and pattern recognition systems , 1994 .
 Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
 Gary Cziko,et al. Without Miracles: Universal Selection Theory and the Second Darwinian Revolution , 1995 .
 Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
 S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .
 Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.
 Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
 Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.
 Pawel Cichosz,et al. Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning , 1994, J. Artif. Intell. Res..
 Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
 Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
 Mandayam A. L. Thathachar,et al. Local and Global Optimization Algorithms for Generalized Learning Automata , 1995, Neural Computation.
 R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
 Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
 Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
 Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
 Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .
 Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
 Learning and memory in the honeybee. , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.
 Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
 Peter Dayan,et al. Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.
 Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
 Leemon C. Baird. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
 A. Dickinson,et al. Reward-related signals carried by dopamine neurons. , 1995 .
 P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
 Richard S. Sutton,et al. Model-Based Reinforcement Learning with an Approximate, Learned Model , 1996 .
 Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
 Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 1996, Machine Learning.
 Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
 Andrew G. Barto,et al. Large-scale dynamic optimization using teams of reinforcement learning agents , 1996 .
 John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
 Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
 H. Markram,et al. Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.
 David S. Touretzky,et al. Shaping robot behavior using principles from instrumental conditioning , 1997, Robotics Auton. Syst..
 M. Hammer. The neural basis of associative reward learning in honeybees , 1997, Trends in Neurosciences.
 J.N. Tsitsiklis,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
 Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
 Andrew W. Moore,et al. Efficient Locally Weighted Polynomial Regression Predictions , 1997, ICML.
 Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
 Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
 Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
 W. Schultz,et al. Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.
 J. Hollerman,et al. Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.
 R. Clark,et al. Classical conditioning and brain systems: the role of awareness. , 1998, Science.
 Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
 Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
 K. Berridge,et al. What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? , 1998, Brain Research Reviews.
 Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
 W. Schultz,et al. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.
 S. Grossberg,et al. How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.
 Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
 Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
 Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
 Simon Haykin,et al. A dynamic channel assignment policy through Q-learning , 1999, IEEE Trans. Neural Networks.
 C. Buhusi,et al. Timing in simple conditioning and occasion setting: a neural network approach , 1999, Behavioural Processes.
 Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
 Herbert Jaeger,et al. Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.
 Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
 Ryan,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.
 P. Montague,et al. Predictability Modulates Human Brain Response to Reward , 2001, The Journal of Neuroscience.
 Peter Redgrave,et al. A computational model of action selection in the basal ganglia. II. Analysis and simulation of behaviour , 2001, Biological Cybernetics.
 D. Kahneman,et al. Functional Imaging of Neural Responses to Expectancy and Experience of Monetary Gains and Losses , 2001, Neuron.
 John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
 Rajesh P. N. Rao,et al. Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning , 2001, Neural Computation.
 M. Arbib,et al. Modeling functions of striatal dopamine modulation in learning and planning , 2001, Neuroscience.
 Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
 Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
 Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
 Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .
 Gerald Tesauro,et al. Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..
 Eytan Ruppin,et al. Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.
 David S. Touretzky,et al. Timing and Partial Observability in the Dopamine System , 2002, NIPS.
 P. Montague,et al. Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.
 John N. J. Reynolds,et al. Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.
 Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
 Theodore J. Perkins,et al. On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains , 2002, ICML.
 Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
 Andrew Y. Ng,et al. Shaping and policy search in reinforcement learning , 2003 .
 Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
 Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
 Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
 Karl J. Friston,et al. Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.
 W. Schultz,et al. Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.
 M. Thathachar,et al. Networks of Learning Automata: Techniques for Online Stochastic Optimization , 2003 .
 H. Seung. Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.
 Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.
 Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
 Xiaohui Xie,et al. Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.
 Thomas G. Dietterich,et al. Explanation-Based Learning and Reinforcement Learning: A Unified View , 1997, Machine Learning.
 Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
 Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2004, Machine Learning.
 G. Peterson. A day of great illumination: B. F. Skinner's discovery of shaping. , 2004, Journal of the experimental analysis of behavior.
 Richard S. Sutton,et al. Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.
 A. Barto,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
 Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.
 Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning , 2004, Machine Learning.
 José Luis Contreras-Vidal,et al. A Predictive Reinforcement Model of Dopamine Neurons for Learning Approach Behavior , 1999, Journal of Computational Neuroscience.
 John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
 Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
 C. Glymour,et al. A theory of causal learning in children: causal maps and Bayes nets. , 2004, Psychological review.
 Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .
 Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
 Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
 Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
 R. Sutton,et al. Synthesis of nonlinear control surfaces by a layered associative search network , 2004, Biological Cybernetics.
 John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
 Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
 Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
 Richard S. Sutton,et al. Landmark learning: An illustration of associative search , 1981, Biological Cybernetics.
 Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
 Allen Newell,et al. The problem of expensive chunks and its solution by restricting expressiveness , 1993, Machine Learning.
 W. Pan,et al. Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.
 Nicol N. Schraudolph,et al. Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.
 W. Schultz,et al. Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.
 Chrystopher L. Nehaniv,et al. Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.
 P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.
 Richard S. Sutton,et al. Learning to Predict by the Methods of Temporal Differences , 1988, Machine Learning.
 Peter Dayan,et al. How fast to work: Response vigor, motivation and tonic dopamine , 2005, NIPS.
 Jongho Kim,et al. An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm , 2005, CIS.
 C. Padoa-Schioppa,et al. Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.
 Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
 The short-latency dopamine signal: a role in discovering novel actions? , 2006, Nature Reviews Neuroscience.
 Michael J. Frank,et al. Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.
 Peter Dayan,et al. The misbehavior of value and the discipline of the will , 2006, Neural Networks.
 Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
 David S. Touretzky,et al. Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.
 P. Dayan,et al. Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.
 Aaron C. Courville,et al. Bayesian theories of conditioning in a changing world , 2006, Trends in Cognitive Sciences.
 Razvan V. Florian,et al. Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.
 Xi-Ren Cao,et al. Stochastic Learning and Optimization - A Sensitivity-Based Approach , 2007 .
 PVLV: the primary value and learned value Pavlovian learning algorithm. , 2007, Behavioral neuroscience.
 Paolo Calabresi,et al. Dopamine-mediated regulation of corticostriatal synaptic plasticity , 2007, Trends in Neurosciences.
 Robert A. Legenstein,et al. Theoretical Analysis of Learning with Reward-Modulated Spike-Timing-Dependent Plasticity , 2007, NIPS.
 M. Roesch,et al. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.
 J. O'Doherty,et al. Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.
 E. Izhikevich. Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.
 Olle Gällmo,et al. Reinforcement Learning by Construction of Hypothetical Targets , 2007 .
 Adam Johnson,et al. Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.
 Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
 Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.
 H. Seo,et al. Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. , 2007, Cerebral cortex.
 Ron Meir,et al. Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule , 2007, Neural Computation.
 M. Farries,et al. Reinforcement learning with modulated spike timing dependent synaptic plasticity. , 2007, Journal of neurophysiology.