暂无分享,去创建一个
[1] Michael A. Trick,et al. A Linear Programming Approach to Solving Stochastic Dynamic Programming , 1993 .
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[4] C. Guestrin,et al. Solving Factored MDPs with Hybrid State and Action Variables , 2006, J. Artif. Intell. Res..
[5] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .
[6] L. Khachiyan. Polynomial algorithms in linear programming , 1980 .
[7] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.
[8] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[9] Milos Hauskrecht,et al. Learning Basis Functions in Hybrid Domains , 2006, AAAI.
[10] Shobha Venkataraman,et al. Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.
[11] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[12] Sridhar Mahadevan,et al. Samuel Meets Amarel: Automating Value Function Approximation Using Global State Space Analysis , 2005, AAAI.
[13] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .
[14] John Rust. Using Randomization to Break the Curse of Dimensionality , 1997 .
[15] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .
[16] Scott Sanner,et al. Approximate Linear Programming for First-order MDPs , 2005, UAI.
[17] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[18] Ronald A. Howard,et al. Influence Diagrams , 2005, Decis. Anal..
[19] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[20] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.
[21] Sridhar Mahadevan,et al. Learning Representation and Control in Continuous Markov Decision Processes , 2006, AAAI.
[22] Doina Precup,et al. Metrics for Markov Decision Processes with Infinite State Spaces , 2005, UAI.
[23] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[24] Luis E. Ortiz,et al. Selecting approximately-optimal actions in complex structured domains , 2002 .
[25] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Frank Jensen,et al. From Influence Diagrams to junction Trees , 1994, UAI.
[27] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[28] Craig Boutilier,et al. Greedy linear value-approximation for factored Markov decision processes , 2002, AAAI/IAAI.
[29] D. Higdon. Auxiliary Variable Methods for Markov Chain Monte Carlo with Applications , 1998 .
[30] Adnan Darwiche,et al. Approximating MAP using Local Search , 2001, UAI.
[31] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[32] A. Michael,et al. A Linear Programming Approach toSolving Stochastic Dynamic Programs , 1993 .
[33] D. Koller,et al. Planning under uncertainty in complex structured environments , 2003 .
[34] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[35] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[36] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[37] Dr. M. G. Worster. Methods of Mathematical Physics , 1947, Nature.
[38] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..
[39] Sridhar Mahadevan,et al. Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.
[40] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[41] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .
[42] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[43] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[44] Rina Dechter,et al. Bucket elimination: A unifying framework for probabilistic inference , 1996, UAI.
[45] G. Casella,et al. Rao-Blackwellisation of sampling schemes , 1996 .
[46] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.
[47] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[48] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[49] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[50] Milos Hauskrecht,et al. An MCMC Approach to Solving Hybrid Factored MDPs , 2005, IJCAI.
[51] Gregory F. Cooper,et al. A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.
[52] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.
[53] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[54] Dale Schuurmans,et al. Direct value-approximation for factored MDPs , 2001, NIPS.
[55] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[56] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[57] L. G. H. Cijan. A polynomial algorithm in linear programming , 1979 .
[58] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[59] Milos Hauskrecht,et al. Solving Factored MDPs with Exponential-Family Transition Models , 2006, ICAPS.
[60] Milos Hauskrecht,et al. Solving Factored MDPs with Continuous and Discrete Variables , 2004, UAI.
[61] Adnan Darwiche,et al. Solving MAP Exactly using Systematic Search , 2002, UAI.
[62] John N. Tsitsiklis,et al. Introduction to linear optimization , 1997, Athena scientific optimization and computation series.
[63] Zhengzhu Feng,et al. Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.
[64] David E. Smith,et al. Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.
[65] S. Duane,et al. Hybrid Monte Carlo , 1987 .
[66] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[67] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[68] Changhe Yuan,et al. Annealed MAP , 2004, UAI.
[69] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[70] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[71] Milos Hauskrecht,et al. Linear Program Approximations for Factored Continuous-State Markov Decision Processes , 2003, NIPS.
[72] Milos Hauskrecht,et al. Heuristic Refinements of Approximate Linear Programming for Factored Continuous-State Markov Decision Processes , 2004, ICAPS.