Concurrent decision making in markov decision processes
暂无分享,去创建一个
[1] Earl D. Sacerdoti,et al. The Nonlinear Nature of Plans , 1975, IJCAI.
[2] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[3] Paweł Cichosz. Learning Multidimensional Control Actions From Delayed Reinforcements , 1995 .
[4] Bhaskara Marthi,et al. Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.
[5] Glenn A. Iba,et al. A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.
[6] Mary D Klein Breteler,et al. Drawing sequences of segments in 3D: kinetic influences on arm configuration. , 2003, Journal of neurophysiology.
[7] Roderic A. Grupen,et al. Robust finger gaits from closed-loop controllers , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.
[8] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.
[9] M. A. Arbib,et al. Models of Trajectory Formation and Temporal Interaction of Reach and Grasp. , 1993, Journal of motor behavior.
[10] Ronald A. Howard,et al. Dynamic Probabilistic Systems , 1971 .
[11] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[12] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.
[13] R. L. Keeney,et al. Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.
[14] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .
[15] Roderic A. Grupen,et al. A control basis for multilegged walking , 1996, Proceedings of IEEE International Conference on Robotics and Automation.
[16] John J. Craig Zhu,et al. Introduction to robotics mechanics and control , 1991 .
[17] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[18] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[19] T. Vincent,et al. Nonlinear and Optimal Control Systems , 1997 .
[20] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[21] A. Pellionisz,et al. Tensor network theory of the metaorganization of functional geometries in the central nervous system , 1985, Neuroscience.
[22] Ronen I. Brafman,et al. Planning with Concurrent Interacting Actions , 1997, AAAI/IAAI.
[23] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[24] Geoffrey J. Gordon,et al. Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.
[25] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[26] R. Cohen,et al. Where grasps are made reveals how grasps are planned: generation and recall of motor plans , 2004, Experimental Brain Research.
[27] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[28] Andrew G. Barto,et al. Robot Weightlifting By Direct Policy Search , 2001, IJCAI.
[29] Robert Platt,et al. Coarticulation in Markov Decision Processes , 2004, NIPS.
[30] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[31] Ronen I. Brafman,et al. Partial-Order Planning with Concurrent Interacting Actions , 2011, J. Artif. Intell. Res..
[32] Rina Dechter,et al. Bucket elimination: A unifying framework for probabilistic inference , 1996, UAI.
[33] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.
[34] M. Jeannerod. Intersegmental coordination during reaching at natural visual objects , 1981 .
[35] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..
[36] J F Soechting,et al. Organization of sequential typing movements. , 1992, Journal of neurophysiology.
[37] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[38] Csaba Szepesvári,et al. Multi-criteria Reinforcement Learning , 1998, ICML.
[39] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[40] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[41] Eithan Ephrati,et al. Divide and Conquer in Multi-Agent Planning , 1994, AAAI.
[42] M. Veloso,et al. Nonlinear Planning with Parallel Resource Allocation , 1990 .
[43] Geoffrey E. Hinton,et al. Reinforcement learning for factored Markov decision processes , 2002 .
[44] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.
[45] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[46] Ralph L. Keeney,et al. Decisions with multiple objectives: preferences and value tradeoffs , 1976 .
[47] Chitta Baral,et al. Reasoning About Effects of Concurrent Actions , 1997, J. Log. Program..
[48] A. Liegeois,et al. Automatic supervisory control of the configuration and behavior of multi-body mechanisms , 1977 .
[49] Gerhard Weiss,et al. Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .
[50] D. Koller,et al. Planning under uncertainty in complex structured environments , 2003 .
[51] Christer Bäckström. Finding Least Constrained Plans and Optimal Parallel Executions is Harder than We Thought , 1994 .
[52] Michael P. Wellman,et al. Multiagent Reinforcement Learning in Stochastic Games , 1999, ICML 1999.
[53] James S. Albus,et al. Brains, behavior, and robotics , 1981 .
[54] J. Kelso,et al. Skilled actions: a task-dynamic approach. , 1987, Psychological review.
[55] Sridhar Mahadevan,et al. Decision-Theoretic Planning with Concurrent Temporally Extended Actions , 2001, UAI.
[56] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[57] M. Rosenstein,et al. Supervised Learning Combined with an Actor-Critic Architecture TITLE2: , 2002 .
[58] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.
[59] Sridhar Mahadevan,et al. Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.
[60] S. Chipman. The Remembered Present: A Biological Theory of Consciousness , 1990, Journal of Cognitive Neuroscience.
[61] Roderic A. Grupen,et al. A hybrid architecture for adaptive robot control , 2000 .
[62] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[63] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[64] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[65] Robert Platt,et al. Nullspace composition of control laws for grasping , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.
[66] K. J. Cole,et al. Control of multimovement coordination: sensorimotor mechanisms in speech motor programming. , 1984, Journal of motor behavior.
[67] Gregory R. Andrews,et al. Concurrent programming - principles and practice , 1991 .
[68] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[69] Michael Gelfond,et al. Representing Actions in Extended Logic Programming , 1992, JICSLP.
[70] Michael I. Jordan,et al. Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.
[71] J. Foley. The co-ordination and regulation of movements , 1968 .
[72] Craig A. Knoblock. Generating Parallel Execution Plans with a Partial-order Planner , 1994, AIPS.
[73] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[74] Andrew G. Barto,et al. Lyapunov-Constrained Action Sets for Reinforcement Learning , 2001, ICML.
[75] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[76] Ming Tan,et al. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.
[77] Scott T Grafton,et al. From 'acting on' to 'acting with': the functional anatomy of object-oriented action schemata. , 2003, Progress in brain research.
[78] Raymond Reiter,et al. Natural Actions, Concurrency and Continuous Time in the Situation Calculus , 1996, KR.
[79] Andrew S. Tanenbaum,et al. Operating systems - design and implementation, 3rd Edition , 2005 .
[80] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[81] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[82] Andrew G. Barto,et al. Heuristic Search in Infinite State Spaces Guided by Lyapunov Analysis , 2001, IJCAI.
[83] Craig Boutilier,et al. Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.
[84] Andrew W. Moore,et al. An Introduction to Reinforcement Learning , 1995 .
[85] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[86] Michael T. Rosenstein,et al. Learning to exploit dynamics for robot motor coordination , 2003 .
[87] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[88] J. F. Soechting,et al. Anticipatory and sequential motor control in piano playing , 1997, Experimental Brain Research.
[89] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[90] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.
[91] Henry A. Kautz,et al. Constraint propagation algorithms for temporal reasoning: a revised report , 1989 .
[92] M. Wiesendanger,et al. Coordination of bowing and fingering in violin playing. , 2005, Brain research. Cognitive brain research.
[93] Michael I. Jordan,et al. An Introduction to Graphical Models , 2001 .
[94] Theodore J. Perkins,et al. Lyapunov methods for safe intelligent agent design , 2002 .
[95] M. Wiesendanger,et al. Toward a physiological understanding of human dexterity. , 2001, News in physiological sciences : an international journal of physiology produced jointly by the International Union of Physiological Sciences and the American Physiological Society.
[96] Tsuneo Yoshikawa,et al. Analysis and Control of Robot Manipulators with Redundancy , 1983 .
[97] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.
[98] Sridhar Mahadevan,et al. Coarticulation: an approach for generating concurrent plans in Markov decision processes , 2005, ICML.
[99] Raymond D. Kent,et al. Coarticulation in recent speech production models , 1977 .
[100] Yoshihiko Nakamura,et al. Advanced robotics - redundancy and optimization , 1990 .
[101] Pierre Régnier,et al. Complete Determination of Parallel Actions and Temporal Optimization in Linear Plans of Action , 1991, EWSP.
[102] Suzanne Daneau,et al. Action , 2020, Remaking the Real Economy.
[103] Michael Thielscher,et al. Representing Concurrent Actions and Solving Conflicts , 1994, Log. J. IGPL.
[104] Roderic A. Grupen,et al. Coordinated teams of reactive mobile platforms , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[105] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[106] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[107] Mausam,et al. Solving Concurrent Markov Decision Processes , 2004, AAAI.
[108] Håkan L. S. Younes,et al. A Formalism for Stochastic Decision Processes with Asynchronous Events , 2004 .
[109] Mausam,et al. Concurrent Probabilistic Temporal Planning , 2005, ICAPS.
[110] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[111] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[112] Andrew G. Barto,et al. PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.
[113] E. Bizzi,et al. Theoretical and Experimental Perspectives on Arm Trajectory Formation: A Distributed Model of Motor Redundancy , 1988 .
[114] Sridhar Mahadevan,et al. Learning to Take Concurrent Actions , 2002, NIPS.
[115] Geoffrey E. Hinton,et al. Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task , 2000, NIPS.
[116] Henry A. Kautz,et al. Reasoning about plans , 1991, Morgan Kaufmann series in representation and reasoning.
[117] Satinder P. Singh,et al. How to Dynamically Merge Markov Decision Processes , 1997, NIPS.
[118] Roderic A. Grupen,et al. A Developmental Organization for Robot Behavior , 2005 .
[119] Andrew S. Tanenbaum,et al. Operating systems: design and implementation , 1987, Prentice-Hall software series.
[120] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[121] Earl David Sacerdoti,et al. A Structure for Plans and Behavior , 1977 .
[122] Sridhar Mahadevan,et al. Hierarchical Multiagent Reinforcement Learning , 2004 .
[123] Daphne Koller,et al. Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.