论文信息 - An Ensemble Architecture for Learning Complex Problem-Solving Techniques from Demonstration - 字舞流文

An Ensemble Architecture for Learning Complex Problem-Solving Techniques from Demonstration

We present a novel ensemble architecture for learning problem-solving techniques from a very small number of expert solutions and demonstrate its effectiveness in a complex real-world domain. The key feature of our “Generalized Integrated Learning Architecture” (GILA) is a set of heterogeneous independent learning and reasoning (ILR) components, coordinated by a central meta-reasoning executive (MRE). The ILRs are weakly coupled in the sense that all coordination during learning and performance happens through the MRE. Each ILR learns independently from a small number of expert demonstrations of a complex task. During performance, each ILR proposes partial solutions to subproblems posed by the MRE, which are then selected from and pieced together by the MRE to produce a complete solution. The heterogeneity of the learner-reasoners allows both learning and problem solving to be more effective because their abilities and biases are complementary and synergistic. We describe the application of this novel learning and problem solving architecture to the domain of airspace management, where multiple requests for the use of airspaces need to be deconflicted, reconciled, and managed automatically. Formal evaluations show that our system performs as well as or better than humans after learning from the same training data. Furthermore, GILA outperforms any individual ILR run in isolation, thus demonstrating the power of the ensemble architecture for learning and problem solving.

Santiago Ontañón | James A. Hendler | Thomas G. Dietterich | Victor R. Lesser | Xiaoqin Zhang | Deborah L. McGuinness | Weng-Keen Wong | Subbarao Kambhampati | Li Ding | Gerald DeJong | James Michaelis | Ashwin Ram | Daniel D. Corkill | Jinhong K. Guo | Ugur Kuter | Janardhan Rao Doppa | Prasad Tadepalli | Elizabeth T. Whitaker | Diana F. Spears | Darren Scott Appling | Derek T. Green | Sung Wook Yoon | Ethan Trewhitt | Charles Parker | Chongjie Zhang | Zhexuan Song | Hala Mostafa | Antons Rebguns | Daniel McFarlane | Martin O. Hofmann | Bhavesh Shrestha | Geoffrey Levine | Huzaifa Zafar | Kenneth R. Whitebread | Phillip DiBona | James R. Michaelis | Reid MacTavish | Jainarayan Radhakrishnan | S. Kambhampati | D. McGuinness | G. DeJong | Prasad Tadepalli | S. Yoon | V. Lesser | D. Corkill | A. Ram | D. Spears | J. Hendler | Li Ding | Santiago Ontañón | Weng-Keen Wong | Hala Mostafa | U. Kuter | X. Zhang | Charles Parker | H. Zafar | Chongjie Zhang | Z. Song | D. S. Appling | M. Hofmann | E. Trewhitt | E. Whitaker | B. Shrestha | Antons Rebguns | Geoffrey Levine | K. Whitebread | Phil DiBona | J. Guo | Daniel McFarlane | Reid MacTavish | J. Radhakrishnan

[1] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[2] Robert Givan,et al. Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[3] Ugur Kuter,et al. Learning constraints via demonstration for safe planning , 2007, AAAI 2007.

[4] Gökhan BakIr,et al. Predicting Structured Data , 2008 .

[5] Yan Zhou,et al. Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[6] A. Gerevini,et al. A Planner Based on an Automatically Configurable Portfolio of Domain-independent Planners with Macro-actions : PbP Beniamino Galvani , 2008 .

[7] Santiago Ontañón,et al. An Ensemble Learning and Problem Solving Architecture for Airspace Management , 2009, IAAI.

[8] Peter Edwards,et al. Distributed Learning: An Agent-Based Approach to Data-Mining , 1995 .

[9] Bart Selman,et al. Unifying SAT-based and Graph-based Planning , 1999, IJCAI.

[10] Hector Geffner,et al. Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[11] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[12] Hai Yang,et al. ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[13] Hector Muñoz-Avila,et al. Case-Based Plan Adaptation: An Analysis and Review , 2008, IEEE Intelligent Systems.

[14] Ugur Kuter,et al. LEARNING AND VERIFICATION OF SAFETY PARAMETERS FOR AIRSPACE DECONFLICTION , 2009 .

[15] Ronald L. Rivest,et al. Learning decision lists , 2004, Machine Learning.

[16] John E. Laird,et al. Learning goal hierarchies from structured observations and expert annotations , 2006, Machine Learning.

[17] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[18] Daniel Marcu,et al. Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[19] Jadzia Cendrowska,et al. PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[20] Alan Fern,et al. Gradient Boosting for Sequence Alignment , 2006, AAAI.

[21] Derek T. Green,et al. Inferring and Applying Safety Constraints to Guide an Ensemble of Planners for Airspace Deconfliction , 2008 .

[22] Susan L. Epstein. For the Right Reasons: The FORR Architecture for Learning in a Skill Domain , 1994, Cogn. Sci..

[23] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[24] Brett Benyo,et al. POIROT - Integrated Learning of Web Service Procedures , 2008, AAAI.

[25] Koby Crammer,et al. Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[26] Robert Givan,et al. Taxonomic syntax for first order inference , 1989, JACM.

[27] Zhi-Hua Zhou,et al. Analyzing Co-training Style Algorithms , 2007, ECML.

[28] Julie A. Adams,et al. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 2001, AI Mag..

[29] Victor R. Lesser,et al. The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty , 1980, CSUR.

[30] Jaime G. Carbonell,et al. Strategies for Learning Search Control Rules: An Explanation-based Approach , 1987, IJCAI.

[31] Santiago Ontañón,et al. Learning and joint deliberation through argumentation in multiagent systems , 2007, AAMAS '07.

[32] Manuela M. Veloso,et al. Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[33] Agnar Aamodt,et al. Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[34] Alan Fern,et al. Discriminative Learning of Beam-Search Heuristics for Planning , 2007, IJCAI.

[35] Pedro M. Domingos,et al. Programming by demonstration: a machine learning approach , 2001 .

[36] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[37] David W. Aha,et al. A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[38] R. Michalski. Inferential Theory of Learning as a Conceptual Basis for Multistrategy Learning , 2004, Machine Learning.

[39] Thomas G. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[40] Joseph Y. Halpern,et al. Responsibility and Blame: A Structural-Model Approach , 2003, IJCAI.

[41] Alfonso Gerevini,et al. An Automatically Configurable Portfolio-based Planner with Macro-actions: PbP , 2009, ICAPS.

[42] Edmund M. Clarke,et al. Model Checking , 1999, Handbook of Automated Reasoning.

[43] Michael Collins,et al. Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[44] Y HalpernJoseph,et al. Responsibility and blame , 2004 .

[45] Andreas Krause,et al. Advances in Neural Information Processing Systems (NIPS) , 2014 .

[46] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[47] David B. Leake,et al. Goal-driven learning , 1995 .

[48] Subbarao Kambhampati,et al. Hierarchical Strategy Learning with Hybrid Representations , 2007, AAAI 2007.

[49] Roni Khardon,et al. Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[50] Hector Muñoz-Avila,et al. HTN-MAKER: Learning HTNs with Minimal Additional Knowledge Engineering Required , 2008, AAAI.

[51] Deborah L. McGuinness,et al. Towards the Explanation of Workflows , 2009, ExaCt.

[52] Winton H. E. Davies. Communication of inductive inference , 1999 .

[53] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[54] Daniel Bryce,et al. MABLE: a framework for learning from natural instruction , 2009, AAMAS.

[55] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[56] Peter Edwards,et al. The Communication of Inductive Inferences , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[57] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[58] Peter Bakker,et al. Robot see, robot do: An overview of robot imitation , 1996 .

[59] Bart Selman,et al. Learning Declarative Control Rules for Constraint-BAsed Planning , 2000, ICML.