Compact parametric models for efficient sequential decision making in high-dimensional, uncertain domains
暂无分享,去创建一个
[1] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[2] Panos E. Trahanias,et al. Real-time hierarchical POMDPs for autonomous robot navigation , 2007, Robotics Auton. Syst..
[3] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[4] Zhengzhu Feng,et al. Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.
[5] Pascal Poupart,et al. Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..
[6] Sebastian Thrun,et al. Coastal Navigation with Mobile Robots , 1999, NIPS.
[7] Michael I. Jordan,et al. Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.
[8] David E. Smith,et al. Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.
[9] Brian C. Williams,et al. Model learning for switching linear systems with autonomous mode transitions , 2007, 2007 46th IEEE Conference on Decision and Control.
[10] Guy Shani,et al. Efficient ADD Operations for Point-Based Algorithms , 2008, ICAPS.
[11] Jürgen Schmidhuber,et al. A reinforcement learning approach for individualizing erythropoietin dosages in hemodialysis patients , 2009, Expert Syst. Appl..
[12] Milos Hauskrecht,et al. Solving Factored MDPs with Exponential-Family Transition Models , 2006, ICAPS.
[13] Joelle Pineau,et al. Policy-contingent abstraction for robust robot control , 2002, UAI.
[14] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[15] N. Zhang,et al. Algorithms for partially observable markov decision processes , 2001 .
[16] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[17] A. Shwartz,et al. Handbook of Markov decision processes : methods and applications , 2002 .
[18] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.
[19] Pascal Poupart,et al. Automated Hierarchy Discovery for Planning in Partially Observable Environments , 2006, NIPS.
[20] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[21] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[22] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[23] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[24] Lihong Li,et al. Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.
[25] Michael C. Fu,et al. Solving Continuous-State POMDPs via Density Projection , 2010, IEEE Transactions on Automatic Control.
[26] Michael L. Littman,et al. Perception-based generalization in model-based reinforcement learning , 2009 .
[27] Kee-Eung Kim,et al. Symbolic Heuristic Search Value Iteration for Factored POMDPs , 2008, AAAI.
[28] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[29] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[30] Peter Stone,et al. Model-based function approximation in reinforcement learning , 2007, AAMAS '07.
[31] Stuart J. Russell,et al. The BATmobile: Towards a Bayesian Automated Taxi , 1995, IJCAI.
[32] Leslie Pack Kaelbling,et al. Representing hierarchical POMDPs as DBNs for multi-scale robot localization , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[33] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[34] Joelle Pineau,et al. Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.
[35] Sanjoy Dasgupta,et al. A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.
[36] John N. Tsitsiklis,et al. The complexity of dynamic programming , 1989, J. Complex..
[37] Milos Hauskrecht,et al. Planning treatment of ischemic heart disease with partially observable Markov decision processes , 2000, Artif. Intell. Medicine.
[38] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[39] Jacob Goldberger,et al. Hierarchical Clustering of a Mixture Model , 2004, NIPS.
[40] Alexei Makarenko,et al. Parametric POMDPs for planning in continuous state spaces , 2006, Robotics Auton. Syst..
[41] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[42] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..
[43] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..
[44] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .
[45] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.
[46] Jesse Hoey,et al. Solving POMDPs with Continuous or Large Discrete Observation Spaces , 2005, IJCAI.
[47] Leslie Pack Kaelbling,et al. Robust Belief-Based Execution of Manipulation Programs , 2008 .
[48] David Hsu,et al. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.
[49] Jeffrey K. Uhlmann,et al. New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.
[50] P. Poupart. Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .
[51] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[52] Geoffrey E. Hinton,et al. Variational Learning for Switching State-Space Models , 2000, Neural Computation.
[53] Trey Smith,et al. Probabilistic planning for robotic exploration , 2007 .
[54] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[55] Joelle Pineau,et al. Bayesian reinforcement learning in continuous POMDPs with application to robot navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.
[56] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[57] Eric Horvitz,et al. Prediction, Expectation, and Surprise: Methods, Designs, and Study of a Deployed Traffic Forecasting Service , 2005, UAI.
[58] Joelle Pineau,et al. Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning , 2008, AAAI.
[59] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[60] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[61] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[62] Michael L. Littman,et al. Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.
[63] Jesse Hoey,et al. A Decision-Theoretic Approach to Task Assistance for Persons with Dementia , 2005, IJCAI.
[64] Thomas J. Walsh,et al. Efficient Exploration With Latent Structure , 2005, Robotics: Science and Systems.
[65] David A. McAllester,et al. Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.
[66] Brahim Chaib-draa,et al. An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.
[67] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.
[68] Craig Boutilier,et al. Value-Directed Compression of POMDPs , 2002, NIPS.
[69] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[70] James T. Kwok,et al. Simplifying Mixture Models Through Function Approximation , 2006, IEEE Transactions on Neural Networks.
[71] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.
[72] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[73] Geoffrey J. Gordon,et al. Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..
[74] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[75] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[76] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[77] Lihong Li,et al. Incremental Model-based Learners With Formal Learning-Time Guarantees , 2006, UAI.
[78] Marc Toussaint,et al. Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.
[79] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[80] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[81] Milind Tambe,et al. Towards Faster Planning with Continuous Resources in Stochastic Domains , 2008, AAAI.
[82] James M. Rehg,et al. Data-Driven MCMC for Learning and Inference in Switching Linear Dynamic Systems , 2005, AAAI.
[83] Eric A. Hansen,et al. Synthesis of Hierarchical Finite-State Controllers for POMDPs , 2003, ICAPS.
[84] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[85] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[86] Stuart J. Russell,et al. Angelic Hierarchical Planning: Optimal and Online Algorithms , 2008, ICAPS.
[87] Nate Kohl,et al. Reinforcement Learning Benchmarks and Bake-offs II A workshop at the 2005 NIPS conference , 2005 .
[88] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[89] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[90] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[91] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[92] John R. Hershey,et al. Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[93] Guy Shani,et al. Forward Search Value Iteration for POMDPs , 2007, IJCAI.
[94] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.
[95] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[96] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[97] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[98] Wolfram Burgard,et al. Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva , 2000, Int. J. Robotics Res..
[99] J. Tsitsiklis,et al. An optimal one-way multigrid algorithm for discrete-time stochastic control , 1991 .
[100] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.
[101] B. Anderson,et al. Linear Optimal Control , 1971 .
[102] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[103] Michael L. Littman,et al. Online Linear Regression and Its Application to Model-Based Reinforcement Learning , 2007, NIPS.
[104] Carl E. Rasmussen,et al. Model-Based Reinforcement Learning with Continuous States and Actions , 2008, ESANN.
[105] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..
[106] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[107] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..