Methods for deciding what to do next and learning

Recent years have seen intense analysis and questioning of the importance of the role of classical AI-planning in deciding moment-to-moment actions. This has led to development of several new AI-planning paradigms such as reactive planning [Fir87, GL87]. Informally, a planning paradigm refers to principles of representing an agent’s acts, beliefs and expectations about concepts-relatingacts-to-conditions-in-the-world, and methods for prescribing acts in response to conditions. Each paradigm (independent of a domain) develops a uniform way of interacting with the world and as such it attempts to account for complexities in the agent’s environment. The oldest AI-planning paradigm, known as classical planning, is closely related to reasoning and it is a highly cognitive behavior involving explicit goals. On the other hand, the reactive planning paradigms use little or no reasoning and goals are implicit. Reactive planning clearly has a less cognitive character than classical planning. We are interested in agents that both act based on reactions, producing whatever goals are implicit in those reactions, and generate plans to achieve explicit goals. We will refer to these ways of behaving as the agent’s modes of behavior. By what to do next we have in mind the very next physical action an agent situated in the world performs. Actions we consider are provoked either by direct sensing and a specific purpose (i.e., a goal) or by direct sensing and a general condition (i.e., an implicit goal). “Methods for deciding what to do next” in the title of our paper is meant to cover all planning paradigms that advance a physical action for execution, subject to a few simplifying assumptions. This includes actions prescribed by a plan produced by a classical planner, actions suggested by a reactive planner, and actions suggested by experimenting either to gain information about an agent’s capabilities or to achieve a goal heuristically. Since the scope of planning methods differ about when and in what ways the agent may use the output, we suggest the following assumptions on planning methods to limit their scope for deciding what to do next. These assumptions do not affect how the methods work. They only affect the inputs given to the method and the application of its output.

[1]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[2]  Alan D. Christiansen,et al.  Learning reliable manipulation strategies without initial physical models , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[3]  Earl David Sacerdoti,et al.  A Structure for Plans and Behavior , 1977 .

[4]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[5]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[6]  Austin Tate,et al.  Generating Project Networks , 1977, IJCAI.

[7]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[8]  Alan D. Christiansen,et al.  Learning reliable manipulation strategies without initial physical models , 1991, Robotics Auton. Syst..

[9]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[10]  Gerald DeJong,et al.  Explanation-based manipulator learning: Acquisition of planning ability through observation , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[11]  N. Cocchiarella,et al.  Situations and Attitudes. , 1986 .

[12]  Sridhar Mahadevan,et al.  Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[13]  Mark Drummond,et al.  Situated Control Rules , 1989, KR.

[14]  Gerald DeJong,et al.  Explanation-Based Learning of Reactive Operations , 1989, ML.

[15]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[16]  Tom M. Mitchell,et al.  Becoming Increasingly Reactive , 1990, AAAI.

[17]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[18]  Gerald J. Sussman,et al.  A Computational Model of Skill Acquisition , 1973 .

[19]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[20]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[21]  Amy L. Lansky,et al.  Reactive Reasoning and Planning , 1987, AAAI.

[22]  Tom M. Mitchell,et al.  On Becoming Reactive , 1989, ML.

[23]  Melinda T. Gervasio,et al.  Learning General Completable Reactive Plans , 1990, AAAI.

[24]  Tom Michael Mitchell,et al.  Explanation-based generalization: A unifying view , 1986 .

[25]  Alberto M. Segre Explanation-based manipulator learning , 1986 .

[26]  Pattie Maes,et al.  Situated agents can have goals , 1990, Robotics Auton. Syst..

[27]  R. James Firby,et al.  An Investigation into Reactive Planning in Complex Domains , 1987, AAAI.

[28]  Leslie Pack Kaelbling,et al.  Action and planning in embedded agents , 1990, Robotics Auton. Syst..

[29]  David Chapman,et al.  What are plans for? , 1990, Robotics Auton. Syst..

[30]  David E. Wilkins,et al.  Domain-Independent Planning: Representation and Plan Generation , 1984, Artif. Intell..

[31]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[32]  Leslie Pack Kaelbling,et al.  An Architecture for Intelligent Reactive Systems , 1987 .

[33]  Rodney A. Brooks PLANNING IS JUST A WAY OF AVOIDING FIGURING OUT WHAT TO DO NEXT , 1987 .

[34]  Steven Douglas Whitehead,et al.  Reinforcement learning for the adaptive control of perception and action , 1992 .

[35]  Steve Chien,et al.  Learning to integrate reactivity and deliberation in uncertain planning and scheduling problems , 1992 .

[36]  Drew McDermott,et al.  Robot Planning , 1991, AI Mag..

[37]  Marcel Schoppers,et al.  Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[38]  Drew McDermott,et al.  Planning and Acting , 1978, Cogn. Sci..

[39]  Leslie Pack Kaelbling,et al.  Goals as Parallel Program Specifications , 1988, AAAI.