Algorithms for Sequential Decision Making
暂无分享,去创建一个
[1] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[2] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[3] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[4] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[5] Alvin W Drake,et al. Observation of a Markov process through a noisy channel , 1962 .
[6] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .
[7] V. Klee. On the Number of Vertices of a Convex Polytope , 1964, Canadian Journal of Mathematics.
[8] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.
[9] R. Karp,et al. On Nonterminating Stochastic Games , 1966 .
[10] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .
[11] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .
[12] V. Klee,et al. HOW GOOD IS THE SIMPLEX ALGORITHM , 1970 .
[13] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .
[14] H. Kushner,et al. Mathematical programming and the control of Markov chains , 1971 .
[15] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.
[16] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[17] L. Goldschlager. The monotone and planar circuit value problems are log space complete for P , 1977, SIGA.
[18] Loren K. Platzman,et al. Finite memory estimation and control of finite probabilistic systems , 1977 .
[19] Robert G. Bland,et al. New Finite Pivoting Rules for the Simplex Method , 1977, Math. Oper. Res..
[20] Martin L. Puterman,et al. THE ANALYTIC THEORY OF POLICY ITERATION , 1978 .
[21] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .
[22] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[23] K. Sawaki,et al. OPTIMAL CONTROL FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES OVER AN INFINITE HORIZON , 1978 .
[24] Martin L. Puterman,et al. On the Convergence of Policy Iteration in Stationary Dynamic Programming , 1979, Math. Oper. Res..
[25] C. White,et al. Application of Jensen's inequality to adaptive suboptimal design , 1980 .
[26] Nesa L'abbe Wu,et al. Linear programming and extensions , 1981 .
[27] J. Filar. Ordered field property for stochastic games when the player who controls transitions changes from state to state , 1981 .
[28] Jan Telgen,et al. Stochastic Dynamic Programming , 2016 .
[29] Henryk Wozniakowski,et al. Complexity of linear programming , 1982, Oper. Res. Lett..
[30] Stef Tijs,et al. Fictitious play applied to sequences of games and discounted stochastic games , 1982 .
[31] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[32] Narendra Karmarkar,et al. A new polynomial-time algorithm for linear programming , 1984, Comb..
[33] James N. Eagle. The Optimal Search for a Moving Target When the Search Path Is Constrained , 1984, Oper. Res..
[34] S. Marcus,et al. Adaptive control of discounted Markov decision chains , 1985 .
[35] Leslie G. Valiant,et al. NP is as easy as detecting unique solutions , 1985, STOC '85.
[36] Karl-Heinz Waldmann,et al. On Bounds for Dynamic Programs , 1985, Math. Oper. Res..
[37] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[38] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[39] O. J. Vrieze,et al. Surveys in game theory and related topics , 1987 .
[40] S. Verdú,et al. Abstract dynamic programming models under commutativity conditions , 1987 .
[41] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.
[42] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[43] Marcel Schoppers,et al. Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.
[44] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[45] S. Marcus,et al. Adaptive control of Markov processes with incomplete state information and unknown parameters , 1987 .
[46] Chelsea C. White,et al. Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..
[47] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[48] K. Vrieze. Zero-sum stochastic games , 1989 .
[49] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[50] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[51] David H. Ackley,et al. Generalization and Scaling in Reinforcement Learning , 1989, NIPS.
[52] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[53] Anne Condon,et al. On Algorithms for Simple Stochastic Games , 1990, Advances In Computational Complexity Theory.
[54] P. Tseng. Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .
[55] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[56] David H. Ackley,et al. Interactions between learning and evolution , 1991 .
[57] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.
[58] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[59] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..
[60] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[61] David H. Ackley,et al. Adaptation in Constant Utility Non-Stationary Environments , 1991, ICGA.
[62] Anne Condon,et al. The Complexity of Stochastic Games , 1992, Inf. Comput..
[63] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[64] D. Koller,et al. The complexity of two-person zero-sum games in extensive form , 1992 .
[65] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .
[66] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .
[67] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[68] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[69] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[70] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[71] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[72] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[73] Sridhar Mahadevan,et al. Rapid Task Learning for Real Robots , 1993 .
[74] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[75] Satinder Singh,et al. Learning to Solve Markovian Decision Processes , 1993 .
[76] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[77] M. K. Ghosh,et al. Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .
[78] Michael L. Littman,et al. Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.
[79] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[80] Gary McGraw,et al. Emergent Control and Planning in an Autonomous Vehicle , 1993 .
[81] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[82] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.
[83] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[84] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[85] Steven I. Marcus,et al. Controlled Markov processes on the infinite planning horizon: Weighted and overtaking cost criteria , 1994, Math. Methods Oper. Res..
[86] George H. John. When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.
[87] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.
[88] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[89] Daniel S. Weld,et al. Probabilistic Planning with Information Gathering and Contingent Execution , 1994, AIPS.
[90] Leslie Pack Kaelbling,et al. Toward Approximate Planning in Very Large Stochastic Domains , 1994, AAAI 1994.
[91] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[92] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[93] Chelsea C. White,et al. Finite-Memory Suboptimal Design for Partially Observed Markov Decision Processes , 1994, Oper. Res..
[94] T. Sejnowski,et al. The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms. , 1994, Learning & memory.
[95] Dave Cliff,et al. Adding Temporary Memory to ZCS , 1994, Adapt. Behav..
[96] M. Littman. The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .
[97] Jim Blythe,et al. Planning with External Events , 1994, UAI.
[98] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[99] Bernhard von Stengel,et al. Fast algorithms for finding randomized strategies in game trees , 1994, STOC '94.
[100] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[101] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[102] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[103] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..
[104] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[105] Nicholas Kushmerick,et al. An Algorithm for Probabilistic Planning , 1995, Artif. Intell..
[106] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[107] Reid G. Simmons,et al. Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.
[108] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[109] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.
[110] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[111] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[112] A. Harry Klopf,et al. Reinforcement Learning Applied to a Differential Game , 1995, Adapt. Behav..
[113] Csaba Szepesvári,et al. General Framework for Reinforcement Learning , 1995 .
[114] Long Ji Lin,et al. Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..
[115] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[116] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[117] Walter Ludwig,et al. A Subexponential Randomized Algorithm for the Simple Stochastic Game Problem , 1995, Inf. Comput..
[118] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[119] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.
[120] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[121] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[122] John Rust. Numerical dynamic programming in economics , 1996 .
[123] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.
[124] Leon A. Petrosyan,et al. Game Theory (Second Edition) , 1996 .
[125] T. Dean,et al. Planning under uncertainty: structural assumptions and computational leverage , 1996 .
[126] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .
[127] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..