Reinforcement Learning: An Introduction
暂无分享,去创建一个
[1] J. Stevens,et al. Animal Intelligence , 1883, Nature.
[2] E. Thorndike. “Animal Intelligence” , 1898, Nature.
[3] Adaptation , 1926 .
[4] H. Blodgett,et al. The effect of the introduction of reward upon the maze performance of rats , 1929 .
[5] E. Tolman. Purposive behavior in animals and men , 1932 .
[6] C. L. Hull. The goal-gradient hypothesis and maze learning. , 1932 .
[7] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[8] G. Finch,et al. Higher Order Conditioning with Constant Motivation , 1934 .
[9] W. R. Thompson. On the Theory of Apportionment , 1935 .
[10] R. M. Elliott,et al. Behavior of Organisms , 1991 .
[11] R. Thouless. Experimental Psychology , 1939, Nature.
[12] K. J. Craik,et al. The nature of explanation , 1944 .
[13] K. Spence. The role of secondary reinforcement in delayed reward learning. , 1947 .
[14] E. Tolman. Cognitive maps in rats and men. , 1948, Psychological review.
[15] Claude E. Shannon,et al. Programming a computer for playing chess , 1950 .
[16] C. Shannon. A chess-playing machine. , 1950, Scientific American.
[17] D. Thistlethwaite. A critical review of latent learning and related experiments. , 1951, Psychological bulletin.
[18] W. Walter. A Machine that Learns , 1951 .
[19] J. Knott. The organization of behavior: A neuropsychological theory , 1951 .
[20] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[21] J. Deutsch. A new type of behaviour theory. , 1953, British journal of psychology.
[22] James L Olds,et al. Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. , 1954, Journal of comparative and physiological psychology.
[23] W. A. Clark,et al. Simulation of self-organizing systems by digital computer , 1954, Trans. IRE Prof. Group Inf. Theory.
[24] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .
[25] J. Deutsch. A Machine with Insight , 1954 .
[26] D. Bernoulli. Exposition of a New Theory on the Measurement of Risk , 1954 .
[27] B. G. Farley,et al. Generalization of pattern recognition in a self-organizing system , 1955, AFIPS '55 (Western).
[28] Frederick Mosteller,et al. Stochastic Models for Learning , 1956 .
[29] E. Galanter,et al. On thought: the extrinsic theory. , 1956, Psychological review.
[30] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[31] R. Duncan Luce,et al. Individual Choice Behavior , 1959 .
[32] Jorge Nuno Silva,et al. Mathematical Games , 1959, Nature.
[33] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[34] K. Breland,et al. The misbehavior of organisms. , 1961 .
[35] G. Kimble,et al. Hilgard and Marquis' Conditioning and learning , 1961 .
[36] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.
[37] J. Gillis,et al. Matrix Iterative Analysis , 1961 .
[38] M. D. Egger,et al. Secondary reinforcement in rats as a function of information value and reliability of the stimulus. , 1962, Journal of experimental psychology.
[39] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .
[40] Edward O. Thorp,et al. Beat the Dealer: A Winning Strategy for the Game of Twenty-One , 1965 .
[41] Frank Rosenblatt,et al. PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .
[42] John H. Andreae,et al. STELLA: A scheme for a learning machine , 1963 .
[43] Norbert Wiener,et al. God and Golem, inc. : a comment on certain points where cybernetics impinges on religion , 1964 .
[44] K. Fu,et al. A heuristic approach to reinforcement learning control systems , 1965 .
[45] A. G. Butkovskiy,et al. Optimal control of systems , 1966 .
[46] J. Adler. Chemotaxis in Bacteria , 1966, Science.
[47] R. Bellman. Dynamic programming. , 1957, Science.
[48] Lawrence J. Fogel,et al. Artificial Intelligence through Simulated Evolution , 1966 .
[49] Arnold Griffith. A New Machine-Learning Technique Applied to the Game of Checkers , 1966 .
[50] G. Kimble. Foundations of conditioning and learning , 1967 .
[51] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .
[52] L. Kamin. Predictability, surprise, attention, and conditioning , 1967 .
[53] E. Fischer. Conditioned Reflexes , 1942, American journal of physical medicine.
[54] James L. Melsa,et al. State Functions and Linear Control Systems , 1967 .
[55] J. M. Mendel,et al. Applications of artificial intelligence techniques to a spacecraft control problem , 1967 .
[56] L. Kamin. Attention-like processes in classical conditioning , 1967 .
[57] D. Shepard. A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.
[58] T. Crow. Cortical Synapses and Reinforcement: a Hypothesis , 1968, Nature.
[59] A. L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[60] John H. Andreae,et al. A learning machine with monologue , 1969 .
[61] F. Downton. Stochastic Approximation , 1969, Nature.
[62] Reuben Hersh,et al. Brownian Motion and Potential Theory , 1969 .
[63] R. Herrnstein. On the law of effect. , 1970, Journal of the experimental analysis of behavior.
[64] King-Sun Fu,et al. Learning control systems--Review and outlook , 1970 .
[65] JOHN F. Young. Machine Intelligence , 1971, Nature.
[66] J. Albus. A Theory of Cerebellar Function , 1971 .
[67] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[68] R. Rescorla,et al. A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .
[69] Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..
[70] J. Cross. A Stochastic Learning Model of Economic Behavior , 1973 .
[71] W. T. Powers. Behavior, the control of perception , 1973 .
[72] Ian H. Witten,et al. Human operators and automatic adaptive controllers: A comparative study on a particular control task , 1973 .
[73] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.
[74] M. L. Tsetlin,et al. Automaton theory and modeling of biological systems , 1973 .
[75] Arnold K. Griffith,et al. A Comparison and Evaluation of Three Machine Learning Procedures as Applied to the Game of Checkers , 1974, Artif. Intell..
[76] Kumpati S. Narendra,et al. Games of Stochastic Automata , 1974, IEEE Trans. Syst. Man Cybern..
[77] E Harth,et al. Alopex: a stochastic method for determining visual receptive fields. , 1974, Vision research.
[78] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..
[79] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[80] D. Dennett. Why the Law of Effect will not Go Away , 1975 .
[81] N. Mackintosh. A Theory of Attention: Variations in the Associability of Stimuli with Reinforcement , 1975 .
[82] A. Harry Klopf,et al. A comparison of natural and artificial intelligence , 1975, SGAR.
[83] S. Grossberg. A neural model of attention, reinforcement and discrimination learning. , 1975, International review of neurobiology.
[84] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.
[85] James W. Daniel,et al. Splines and efficiency in dynamic programming , 1976 .
[86] I. Witten. The apparent conflict between estimation and control—a survey of the two-armed bandit problem , 1976 .
[87] Stephen A. Ritz,et al. Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .
[88] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[89] Carl V. Page,et al. Heuristics for Signature Table Analysis as a Pattern Recognition Technique , 1977, IEEE Transactions on Systems, Man, and Cybernetics.
[90] Averill M. Law,et al. The art and theory of dynamic programming , 1977 .
[91] John H. Andreae,et al. Thinking with the teachable machine , 1977 .
[92] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.
[93] Teuvo Kohonen,et al. Associative memory. A system-theoretical approach , 1977 .
[94] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[95] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..
[96] Tom M. Mitchell,et al. Models of Learning Systems. , 1979 .
[97] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.
[98] J. Pearce,et al. A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.
[99] JOHN W. Moore,et al. Erratum to: Formation of attentional-associative networks in real time: Role of the hippocampus and implications for conditioning , 1980 .
[100] J. W. Humberston. Classical mechanics , 1980, Nature.
[101] J. D. E. Koshland. Bacterial chemotaxis as a model behavioral system , 1980 .
[102] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[103] Christopher D. Adams,et al. Instrumental Responding following Reinforcer Devaluation , 1981 .
[104] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[105] Richard S. Sutton,et al. Goal Seeking Components for Adaptive Intelligence: An Initial Assessment. , 1981 .
[106] David Abrahamson,et al. Contemporary Animal Learning Theory , 1981 .
[107] A. Dickinson. Conditioning and associative learning. , 1981, British medical bulletin.
[108] Christopher D. Adams. Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .
[109] K. Narendra,et al. Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information: A Unified Approach , 1982 .
[110] Robert Miller. Meaning and Purpose in the Intact Brain , 1982 .
[111] Lashon B. Booker,et al. Intelligent Behavior as an Adaptation to the Task Environment , 1982 .
[112] G. Monahan. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .
[113] R. Sutton,et al. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.
[114] Alan W. Biermann,et al. Signature Table Systems and Learning , 1982, IEEE Transactions on Systems, Man, and Cybernetics.
[115] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .
[116] W. Levy,et al. Temporal contiguity requirements for long-term associative potentiation/depression in the hippocampus , 1983, Neuroscience.
[117] J. Staddon. Adaptive behavior and learning , 1983 .
[118] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
[119] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[120] Steven Edward Hampson,et al. A neural model of adaptive behavior , 1983 .
[121] Lennart Ljung,et al. Theory and Practice of Recursive Identification , 1983 .
[122] R.M. Dunn,et al. Brains, behavior, and robotics , 1983, Proceedings of the IEEE.
[123] Dimitri P. Bertsekas,et al. Distributed asynchronous computation of fixed points , 1983, Math. Program..
[124] Kumpati S. Narendra,et al. An N-player sequential stochastic game with identical payoffs , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[125] Thomas G. Dietterich,et al. The Role of the Critic in Learning Systems , 1984 .
[126] E. Kandel,et al. Is there a cell-biological alphabet for simple forms of learning? , 1984 .
[127] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[128] Judea Pearl,et al. Heuristics : intelligent search strategies for computer problem solving , 1984 .
[129] Peter G. Doyle,et al. Random Walks and Electric Networks: REFERENCES , 1987 .
[130] Mark Derthick,et al. Variations on the Boltzmann Machine Learning Algorithm , 1984 .
[131] Oliver G. Selfridge,et al. Some Themes and Primitives in Ill-Defined Systems , 1984 .
[132] Peter C. Young,et al. Recursive Estimation and Time Series Analysis , 1984 .
[133] Graham C. Goodwin,et al. Adaptive filtering prediction and control , 1984 .
[134] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[135] A. Dickinson. Actions and habits: the development of behavioural autonomy , 1985 .
[136] D. J. White,et al. Real Applications of Markov Decision Processes , 1985 .
[137] Richard Wheeler,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.
[138] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.
[139] J. Hopfield,et al. The Logic of Limax Learning , 1985 .
[140] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
[141] M. A. L. THATHACHAR,et al. A new approach to the design of reinforcement schemes for learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[142] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
[143] Yann LeCun,et al. Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .
[144] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[145] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[146] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .
[147] David L. Waltz,et al. Toward memory-based reasoning , 1986, CACM.
[148] Hong Wang,et al. Recursive estimation and time-series analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..
[149] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[150] P. S. Sastry,et al. Estimator Algorithms for Learning Automata , 1986 .
[151] Richard E. Korf,et al. A Unified Theory of Heuristic Evaluation Functions and its Application to Learning , 1986, AAAI.
[152] S. Thomas Alexander,et al. Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.
[153] R. Sutton,et al. Simulation of the classically conditioned nictitating membrane response by a neuron-like adaptive element: Response topography, neuronal firing, and interstimulus intervals , 1986, Behavioural Brain Research.
[154] Andrew G. Barto,et al. Game-theoretic cooperativity in networks of self-interested units , 1987 .
[155] Paul E. Utgoff,et al. Learning to control a dynamic physical system , 1987, Comput. Intell..
[156] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[157] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[158] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[159] Ronald L. Rivest,et al. Diversity-Based Inference of Finite Automata (Extended Abstract) , 1987, FOCS.
[160] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
[161] M. J. D. Powell,et al. Radial basis functions for multivariable interpolation: a review , 1987 .
[162] E. Kehoe,et al. Temporal primacy overrides prior training in serial compound conditioning of the rabbit’s nictitating membrane response , 1987 .
[163] Stephen M. Omohundro,et al. Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..
[164] D. J. White,et al. Further Real Applications of Markov Decision Processes , 1988 .
[165] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[166] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[167] Bernard Widrow,et al. Adaptive switching circuits , 1988 .
[168] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[169] Richard E. Korf,et al. Optimal path-finding algorithms* , 1988 .
[170] L. N. Kanal,et al. The CDP: A unifying formulation for heuristic search, dynamic programming, and branch-and-bound , 1988 .
[171] A. Klopf. A neuronal model of classical conditioning , 1988 .
[172] Philip E. Agre,et al. The dynamic structure of everyday life , 1988 .
[173] Robert A. Jacobs,et al. Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.
[174] David S. Broomhead,et al. Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..
[175] D. Broomhead,et al. Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .
[176] Pentti Kanerva,et al. Sparse Distributed Memory , 1988 .
[177] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.
[178] Jonathan H. Connell,et al. A colony architecture for an artificial creature , 1989 .
[179] Andrew G. Barto,et al. From Chemotaxis to cooperativity: abstract exercises in neuronal learning strategies , 1989 .
[180] G. Klir. IS THERE MORE TO UNCERTAINTY THAN SOME PROBABILITY THEORISTS MIGHT HAVE US BELIEVE , 1989 .
[181] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[182] Stephen Grossberg,et al. Neural dynamics of adaptive timing and temporal discrimination during associative learning , 1989, Neural Networks.
[183] Douglas A. Baxter,et al. Computational Capabilities of Single Neurons: Relationship to Simple Forms of Associative and Nonassociative Learning in Aplysia , 1989 .
[184] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[185] C. Watkins. Learning from delayed rewards , 1989 .
[186] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[187] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.
[188] John S. Bridle,et al. Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.
[189] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[190] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[191] Ming Zhang,et al. Comparisons of channel assignment strategies in cellular mobile telephone systems , 1989, IEEE International Conference on Communications, World Prosperity Through Communications,.
[192] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[193] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .
[194] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.
[195] Paul E. Utgoff,et al. Explaining Temporal Differences to Create Useful Concepts for Evaluating States , 1990, AAAI.
[196] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[197] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.
[198] Lyle H. Ungar,et al. A bioreactor benchmark for adaptive network-based process control , 1990 .
[199] T Poggio,et al. Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.
[200] W. Schultz,et al. Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. , 1990, Journal of neurophysiology.
[201] R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.
[202] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[203] Tomaso A. Poggio,et al. Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.
[204] W. Schultz,et al. Dopamine neurons of the monkey midbrain: contingencies of responses to stimuli eliciting immediate behavioral reactions. , 1990, Journal of neurophysiology.
[205] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[206] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[207] Thomas Dean,et al. Toward learning time-varying functions with high input dimensionality , 1990, Proceedings. 5th IEEE International Symposium on Intelligent Control 1990.
[208] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[209] Andrew G. Barto,et al. Connectionist learning for control: an overview , 1990 .
[210] David Chapman,et al. What are plans for? , 1990, Robotics Auton. Syst..
[211] Kumar N. Sivarajan,et al. Dynamic channel assignment in cellular radio , 1990, 40th IEEE Conference on Vehicular Technology.
[212] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.
[213] Steven Minton,et al. Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..
[214] Bruce Abramson,et al. Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..
[215] Ming Tan,et al. Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.
[216] Thomas Ross. Machines who think. , 1933, Science.
[217] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .
[218] P. Parks,et al. Improved Allocation of Weights for Associative Memory Storage in Learning Control Systems , 1991 .
[219] J. Urgen Schmidhuber,et al. Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.
[220] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[221] W. Arthur. Designing Economic Agents that Act Like Human Agents: A Behavioral Approach to Bounded Rationality , 1991 .
[222] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[223] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[224] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[225] P. C. Parks,et al. Design Improvements in Associative Memories for Cerebellar Model Articulation Controllers (CMAC) , 1991 .
[226] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[227] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .
[228] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[229] Hyongsuk Kim,et al. CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.
[230] C. M. Gibbs,et al. Second-order conditioning of the rabbit’s nictitating membrane response , 1991, Integrative physiological and behavioral science : the official journal of the Pavlovian Society.
[231] P. E. An,et al. An improved multi-dimensional CMAC neural network: Receptive field function and placement , 1991 .
[232] D.A. Handelman,et al. Theory and development of higher-order CMAC neural networks , 1992, IEEE Control Systems.
[233] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[234] Léon Bottou,et al. Local Learning Algorithms , 1992, Neural Computation.
[235] Terrence J. Sejnowski,et al. Using Aperiodic Reinforcement for Directed Self-Organization During Development , 1992, NIPS.
[236] W. Schultz,et al. Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.
[237] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.
[238] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[239] P Dayan,et al. Expectation learning in the brain using diffuse ascending projections , 1992 .
[240] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[241] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[242] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[243] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[244] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[245] Paul E. Utgoff,et al. A Teaching Method for Reinforcement Learning , 1992, ML.
[246] Andrew G. Barto,et al. Shaping as a method for accelerating reinforcement learning , 1992, Proceedings of the 1992 IEEE International Symposium on Intelligent Control.
[247] A. Karlsen. [Selection by consequences]. , 1992, Tidsskrift for den Norske laegeforening : tidsskrift for praktisk medicin, ny raekke.
[248] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .
[249] Paul M. B. Vitányi,et al. Theories of learning , 2007 .
[250] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[251] Tom M. Mitchell,et al. Reinforcement learning with hidden states , 1993 .
[252] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[253] John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.
[254] Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.
[255] W. Schultz,et al. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[256] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.
[257] Monte Zweben,et al. Scheduling and rescheduling with iterative repair , 1993, IEEE Trans. Syst. Man Cybern..
[258] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[259] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[260] Pentti Kanerva,et al. Sparse distributed memory and related models , 1993 .
[261] D. J. White,et al. A Survey of Applications of Markov Decision Processes , 1993 .
[262] Satinder Singh,et al. Learning to Solve Markovian Decision Processes , 1993 .
[263] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[264] Stephen I. Gallant,et al. Neural network learning and expert systems , 1993 .
[265] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[266] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[267] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[268] Mark W. Spong,et al. Swing up control of the Acrobot , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.
[269] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[270] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .
[271] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[272] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[273] Mark W. Spong,et al. Swinging up the Acrobot: an example of intelligent control , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[274] Jude W. Shavlik,et al. Incorporating Advice into Agents that Learn from Reinforcements , 1994, AAAI.
[275] Judea Pearl,et al. Counterfactual Probabilities: Computational Methods, Bounds and Applications , 1994, UAI.
[276] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[277] K. P. Unnikrishnan,et al. Alopex: A Correlation-Based Learning Algorithm for Feedforward and Recurrent Neural Networks , 1994, Neural Computation.
[278] Terrence J. Sejnowski,et al. A Novel Reinforcement Model of Birdsong Vocalization Learning , 1994, NIPS.
[279] Chen-Khong Tham,et al. Modular on-line function approximation for scaling up reinforcement learning , 1994 .
[280] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..
[281] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[282] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[283] S. Schaal,et al. Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.
[284] Karl J. Friston,et al. Value-dependent selection in the brain: Simulation in a synthetic neural model , 1994, Neuroscience.
[285] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[286] Prasad Tadepalli,et al. H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .
[287] T. Sejnowski,et al. The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms. , 1994, Learning & memory.
[288] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[289] Marco Colombetti,et al. Training Agents to Perform Sequential Behavior , 1994, Adapt. Behav..
[290] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[291] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[292] W. Estes. Toward a Statistical Theory of Learning. , 1994 .
[293] Paul J. Werbos,et al. The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting , 1994 .
[294] Jerry M. Mendel,et al. Reinforcement-learning control and pattern recognition systems , 1994 .
[295] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[296] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[297] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[298] Gary Cziko,et al. Without Miracles: Universal Selection Theory and the Second Darwinian Revolution , 1995 .
[299] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[300] S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .
[301] Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.
[302] Richard S. Sutton,et al. A Summary Comparison of CMAC Neural Network and Traditional Adaptive Control Systems , 1995 .
[303] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[304] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[305] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.
[306] Pawel Cichosz,et al. Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning , 1994, J. Artif. Intell. Res..
[307] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[308] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[309] J. Pearl. Causal diagrams for empirical research , 1995 .
[310] Mandayam A. L. Thathachar,et al. Local and Global Optimization Algorithms for Generalized Learning Automata , 1995, Neural Computation.
[311] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[312] Yoshua Bengio,et al. Pattern Recognition and Neural Networks , 1995 .
[313] A. Barto,et al. Adaptive Critics and the Basal Ganglia , 1994 .
[314] Michael O. Duff,et al. Q-Learning for Bandit Problems , 1995, ICML.
[315] Jonathan Baxter,et al. Learning internal representations , 1995, COLT '95.
[316] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[317] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[318] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[319] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .
[320] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[321] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[322] M. Hammer,et al. Learning and memory in the honeybee. , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[323] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[324] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.
[325] Jing Peng,et al. Efficient Memory-Based Dynamic Programming , 1995, ICML.
[326] Peter Dayan,et al. Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.
[327] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[328] Pawea Cichosz. Truncating Temporal Diierences: on the Eecient Implementation of Td for Reinforcement Learning , 1995 .
[329] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[330] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[331] A. Dickinson,et al. Reward-related signals carried by dopamine neurons. , 1995 .
[332] J. Wickens,et al. Cellular models of reinforcement. , 1995 .
[333] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[334] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[335] Richard S. Sutton,et al. Model-Based Reinforcement Learning with an Approximate, Learned Model , 1996 .
[336] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[337] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[338] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[339] A. Turing. Intelligent Machinery, A Heretical Theory* , 1996 .
[340] John Rust. Numerical dynamic programming in economics , 1996 .
[341] J. A. Bryson. Optimal control-1950 to 1985 , 1996 .
[342] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[343] Wei Zhang,et al. Reinforcement learning for job shop scheduling , 1996 .
[344] Andrew G. Barto,et al. Large-scale dynamic optimization using teams of reinforcement learning agents , 1996 .
[345] W. Thomas Miller,et al. UNH_CMAC Version 2.1 The University of New Hampshire Implementation of the Cerebellar Model Arithmetic Computer - CMAC , 1996 .
[346] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[347] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[348] Prasad Tadepalli,et al. Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.
[349] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[350] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[351] John N. Tsitsiklis,et al. Rollout Algorithms for Combinatorial Optimization , 1997, J. Heuristics.
[352] A. Machado. Learning the temporal dynamics of behavior. , 1997, Psychological review.
[353] H. Markram,et al. Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.
[354] David S. Touretzky,et al. Shaping robot behavior using principles from instrumental conditioning , 1997, Robotics Auton. Syst..
[355] M. Hammer. The neural basis of associative reward learning in honeybees , 1997, Trends in Neurosciences.
[356] Gary Boone,et al. Minimum-time control of the Acrobot , 1997, Proceedings of International Conference on Robotics and Automation.
[357] U. Frey,et al. Synaptic tagging and long-term potentiation , 1997, Nature.
[358] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[359] J. Clouse. On integrating apprentice learning and reinforcement learning TITLE2 , 1997 .
[360] Xi-Ren Cao,et al. Perturbation realization, potentials, and sensitivity analysis of Markov processes , 1997, IEEE Trans. Autom. Control..
[361] Andrew W. Moore,et al. Efficient Locally Weighted Polynomial Regression Predictions , 1997, ICML.
[362] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[363] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[364] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[365] W. Schultz,et al. Learning of sequential movements by neural network model with dopamine-like reinforcement signal , 1998, Experimental Brain Research.
[366] J. Hollerman,et al. Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.
[367] T. Sejnowski,et al. A Computational Model of Birdsong Learning by Auditory Experience and Auditory Feedback , 1998 .
[368] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[369] R. Clark,et al. Classical conditioning and brain systems: the role of awareness. , 1998, Science.
[370] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[371] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[372] K. Berridge,et al. What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? , 1998, Brain Research Reviews.
[373] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[374] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[375] W. Schultz,et al. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.
[376] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.
[377] Joshua W. Brown,et al. How the Basal Ganglia Use Parallel Excitatory and Inhibitory Learning Pathways to Selectively Respond to Unexpected Rewarding Cues , 1999, The Journal of Neuroscience.
[378] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[379] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[380] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[381] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[382] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .
[383] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[384] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[385] Simon Haykin,et al. A dynamic channel assignment policy through Q-learning , 1999, IEEE Trans. Neural Networks.
[386] C. Buhusi,et al. Timing in simple conditioning and occasion setting: a neural network approach , 1999, Behavioural Processes.
[387] Hector Magno,et al. Models of Learning , 1999 .
[388] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[389] J. Donahoe,et al. Behavior analysis and revaluation. , 2000, Journal of the experimental analysis of behavior.
[390] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[391] Herbert Jaeger,et al. Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.
[392] H. Kushner. Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .
[393] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[394] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[395] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[396] E. Deci,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.
[397] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[398] Samuel M. McClure,et al. Predictability Modulates Human Brain Response to Reward , 2001, The Journal of Neuroscience.
[399] Peter Redgrave,et al. A computational model of action selection in the basal ganglia. II. Analysis and simulation of behaviour , 2001, Biological Cybernetics.
[400] D. Kahneman,et al. Functional Imaging of Neural Responses to Expectancy and Experience of Monetary Gains and Losses tasks with monetary payoffs , 2001 .
[401] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[402] Peter Dayan,et al. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .
[403] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[404] Rajesh P. N. Rao,et al. Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning , 2001, Neural Computation.
[405] M. Arbib,et al. Modeling functions of striatal dopamine modulation in learning and planning , 2001, Neuroscience.
[406] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[407] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[408] Richard S. Sutton,et al. Comparing Policy-Gradient Algorithms , 2001 .
[409] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[410] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[411] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .
[412] Tim Hesterberg,et al. Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.
[413] Martin Müller,et al. Computer Go , 2002, Artif. Intell..
[414] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[415] Gerald Tesauro,et al. Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..
[416] Eytan Ruppin,et al. Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.
[417] David S. Touretzky,et al. Timing and Partial Observability in the Dopamine System , 2002, NIPS.
[418] P. S. Sastry,et al. Varieties of learning automata: an overview , 2002, IEEE Trans. Syst. Man Cybern. Part B.
[419] P. Montague,et al. Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.
[420] John N. J. Reynolds,et al. Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.
[421] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[422] P. Dayan. Matters temporal , 2002, Trends in Cognitive Sciences.
[423] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[424] Theodore J. Perkins,et al. On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains , 2002, ICML.
[425] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[426] Colin Camerer. Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .
[427] Andrew Y. Ng,et al. Shaping and policy search in reinforcement learning , 2003 .
[428] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[429] P. Glimcher. Decisions, Uncertainty, and the Brain: The Science of Neuroeconomics , 2003 .
[430] H. Jaeger. Discrete-time, discrete-valued observable operator models: a tutorial , 2003 .
[431] Samuel M. McClure,et al. A computational substrate for incentive salience , 2003, Trends in Neurosciences.
[432] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[433] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[434] Karl J. Friston,et al. Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.
[435] W. Schultz,et al. Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.
[436] M. Thathachar,et al. Networks of Learning Automata: Techniques for Online Stochastic Optimization , 2003 .
[437] H. Seung,et al. Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.
[438] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[439] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.
[440] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[441] R. Wise. Dopamine, learning and motivation , 2004, Nature Reviews Neuroscience.
[442] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[443] Xiaohui Xie,et al. Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.
[444] Thomas G. Dietterich,et al. Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine Learning.
[445] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[446] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[447] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[448] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[449] G. Peterson. A day of great illumination: B. F. Skinner's discovery of shaping. , 2004, Journal of the experimental analysis of behavior.
[450] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[451] Richard S. Sutton,et al. Associative search network: A reinforcement learning associative memory , 1981, Biological Cybernetics.
[452] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[453] Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.
[454] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[455] Scott Rixner,et al. Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[456] José Luis Contreras-Vidal,et al. A Predictive Reinforcement Model of Dopamine Neurons for Learning Approach Behavior , 1999, Journal of Computational Neuroscience.
[457] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[458] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[459] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[460] Nancy Forbes. Imitation of Life , 2004 .
[461] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.
[462] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[463] David M. Sobel,et al. A theory of causal learning in children: causal maps and Bayes nets. , 2004, Psychological review.
[464] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .
[465] G. Tesauro,et al. Simple neural models of classical conditioning , 1986, Biological Cybernetics.
[466] C. Breazeal. The Behavior System , 2004 .
[467] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[468] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[469] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[470] R. Sutton,et al. Synthesis of nonlinear control surfaces by a layered associative search network , 2004, Biological Cybernetics.
[471] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[472] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[473] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[474] Richard S. Sutton,et al. Landmark learning: An illustration of associative search , 1981, Biological Cybernetics.
[475] A. Redish,et al. Addiction as a Computational Process Gone Awry , 2004, Science.
[476] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[477] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[478] Allen Newell,et al. The problem of expensive chunks and its solution by restricting expressiveness , 1993, Machine Learning.
[479] W. Pan,et al. Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for Eligibility Traces in the Reward-Learning Network , 2005, The Journal of Neuroscience.
[480] Nicol N. Schraudolph,et al. Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.
[481] B. Skinner. Operant Behavior , 2021, Encyclopedia of Evolutionary Psychological Science.
[482] Doina Precup,et al. Off-policy Learning with Options and Recognizers , 2005, NIPS.
[483] W. Schultz,et al. Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.
[484] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.
[485] Chrystopher L. Nehaniv,et al. Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.
[486] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.
[487] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[488] Peter Dayan,et al. How fast to work: Response vigor, motivation and tonic dopamine , 2005, NIPS.
[489] Jongho Kim,et al. An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm , 2005, CIS.
[490] Charles R. Gallistel,et al. Deconstructing the law of effect , 2005, Games Econ. Behav..
[491] Geoffrey J. Gordon,et al. Fast Exact Planning in Markov Decision Processes , 2005, ICAPS.
[492] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[493] C. Padoa-Schioppa,et al. Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.
[494] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[495] P. Redgrave,et al. The short-latency dopamine signal: a role in discovering novel actions? , 2006, Nature Reviews Neuroscience.
[496] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[497] H. Yin,et al. The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.
[498] Michael Thielscher,et al. General Game Playing , 2015 .
[499] Michael J. Frank,et al. Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia , 2006, Neural Computation.
[500] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[501] P. Dayan,et al. Opinion TRENDS in Cognitive Sciences Vol.10 No.8 Full text provided by www.sciencedirect.com A normative perspective on motivation , 2022 .
[502] Peter Dayan,et al. Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .
[503] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[504] David S. Touretzky,et al. Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.
[505] P. Dayan,et al. Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.
[506] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[507] Xin Xu,et al. Kernel Least-Squares Temporal Difference Learning , 2006 .
[508] Aaron C. Courville,et al. Bayesian theories of conditioning in a changing world , 2006, Trends in Cognitive Sciences.
[509] A. Tversky,et al. Prospect theory: an analysis of decision under risk — Source link , 2007 .
[510] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[511] Razvan V. Florian,et al. Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neural Computation.
[512] Peter Redgrave,et al. Basal Ganglia , 2020, Encyclopedia of Autism Spectrum Disorders.
[513] Xi-Ren Cao,et al. Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..
[514] Thomas E. Hazy,et al. PVLV: the primary value and learned value Pavlovian learning algorithm. , 2007, Behavioral neuroscience.
[515] Paolo Calabresi,et al. Dopamine-mediated regulation of corticostriatal synaptic plasticity , 2007, Trends in Neurosciences.
[516] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[517] Robert A. Legenstein,et al. Theoretical Analysis of Learning with Reward-Modulated Spike-Timing-Dependent Plasticity , 2007, NIPS.
[518] R. Sutton. On The Virtues of Linear Learning and Trajectory Distributions , 2007 .
[519] M. Roesch,et al. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.
[520] Vivian V. Valentin,et al. Determining the Neural Substrates of Goal-Directed Learning in the Human Brain , 2007, The Journal of Neuroscience.
[521] E. Izhikevich. Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.
[522] Olle Gällmo,et al. Reinforcement Learning by Construction of Hypothetical Targets , 2007 .
[523] Adam Johnson,et al. Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.
[524] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
[525] D. Hassabis,et al. Deconstructing episodic memory with construction , 2007, Trends in Cognitive Sciences.
[526] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[527] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.
[528] H. Seo,et al. Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. , 2007, Cerebral cortex.
[529] Ron Meir,et al. Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule , 2007, Neural Computation.
[530] M. Farries,et al. Reinforcement learning with modulated spike timing dependent synaptic plasticity. , 2007, Journal of neurophysiology.
[531] R. Malott,et al. Principles of Behavior , 2007 .
[532] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[533] J. Walrand,et al. Distributed Dynamic Programming , 2022 .