论文信息 - Apprenticeship Scheduling: Learning to Schedule from Human Experts - 字舞流文

Apprenticeship Scheduling: Learning to Schedule from Human Experts

Coordinating agents to complete a set of tasks with intercoupled temporal and resource constraints is computationally challenging, yet human domain experts can solve these difficult scheduling problems using paradigms learned through years of apprenticeship. A process for manually codifying this domain knowledge within a computational framework is necessary to scale beyond the "single-expert, single-trainee" apprenticeship model. However, human domain experts often have difficulty describing their decision-making processes, causing the codification of this knowledge to become laborious. We propose a new approach for capturing domain-expert heuristics through a pairwise ranking formulation. Our approach is model-free and does not require enumerating or iterating through a large state-space. We empirically demonstrate that this approach accurately learns multifaceted heuristics on both a synthetic data set incorporating jobshop scheduling and vehicle routing problems and a real-world data set consisting of demonstrations of experts solving a weapon-to-target assignment problem.

Julie A. Shah | Reed Jensen | Jessica Stigile | Sung-Hyun Son | Matthew C. Gombolay | J. Shah | M. Gombolay | Reed Jensen | Sung-Hyun Son | Jessica Stigile

[1] Maksims Volkovs,et al. BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[2] René David,et al. Discrete event dynamic systems , 1989 .

[3] Ronald G. Askin,et al. Project selection, scheduling and resource allocation with time dependent returns , 2009, Eur. J. Oper. Res..

[4] Yi-Chi Wang,et al. Application of reinforcement learning for agent-based production scheduling , 2005, Eng. Appl. Artif. Intell..

[5] Eibe Frank,et al. Logistic Model Trees , 2003, ECML.

[6] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[7] Hamsa Balakrishnan,et al. Estimation of maximum-likelihood discrete-choice models of the runway configuration selection process , 2011, Proceedings of the 2011 American Control Conference.

[8] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[10] Hang Li,et al. Ranking refinement and its application to information retrieval , 2008, WWW.

[11] Manuela M. Veloso,et al. Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[12] Siyuan Liu,et al. Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise , 2014, AAAI.

[13] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[14] Steven D. Pizer,et al. What Are the Consequences of Waiting for Health Care in the Veteran Population? , 2011, Journal of General Internal Medicine.

[15] Nicola Muscettola,et al. Reformulating Temporal Plans for Efficient Execution , 1998, KR.

[16] Neil Yorke-Smith,et al. PTIME: Personalized assistance for calendaring , 2011, TIST.

[17] Tie-Yan Liu,et al. Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[18] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[19] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[20] WangYi-Chi,et al. Application of reinforcement learning for agent-based production scheduling , 2005 .

[21] J. Ginzburg,et al. Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue , 2012 .

[22] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[23] Han-Lim Choi,et al. Consensus-Based Auction Approaches for Decentralized Task Assignment , 2008 .

[24] Tie-Yan Liu,et al. Learning to Rank for Information Retrieval , 2011 .

[25] Julie A. Shah,et al. Fast Scheduling of Multi-Robot Teams with Temporospatial Constraints , 2013, Robotics: Science and Systems.

[26] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[27] Chou-Yuan Lee,et al. Efficiently solving general weapon-target assignment problem by genetic algorithms with greedy eugenics , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[28] L. Goddard,et al. Operations Research (OR) , 2007 .

[29] Anthony Stentz,et al. A comprehensive taxonomy for multi-robot task allocation , 2013, Int. J. Robotics Res..

[30] Bilge Mutlu,et al. Learning-Based Modeling of Multimodal Behaviors for Humanlike Robots , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[31] Dylan A. Shell,et al. Optimal Market-based Multi-Robot Task Allocation via Strategic Pricing , 2013, Robotics: Science and Systems.

[32] T. Salakoski,et al. Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[33] Marius M. Solomon,et al. Algorithms for the Vehicle Routing and Scheduling Problems with Time Window Constraints , 1987, Oper. Res..

[34] Loo Hay Lee,et al. Heuristic methods for vehicle routing problem with time windows , 2001, Artif. Intell. Eng..

[35] Allison Sauppé,et al. A Regression-based Approach to Modeling Addressee Backchannels , 2012, SIGDIAL Conference.

[36] Hema Raghavan,et al. Active Learning with Feedback on Features and Instances , 2006, J. Mach. Learn. Res..

[37] Ernesto Nunes,et al. Multi-Robot Auctions for Allocation of Tasks with Temporal Constraints , 2015, AAAI.

[38] Pengcheng Zhang,et al. A novel multi-agent reinforcement learning approach for job scheduling in Grid computing , 2011, Future Gener. Comput. Syst..

[39] Julie A. Shah,et al. Schedulability Analysis of Task Sets with Upper- and Lower-Bound Temporal Constraints , 2014, J. Aerosp. Inf. Syst..

[40] Sriraam Natarajan,et al. Active Advice Seeking for Inverse Reinforcement Learning , 2015, AAAI.

[41] Rong Jin,et al. Learning to Rank by Optimizing NDCG Measure , 2009, NIPS.

[42] Anthony Stentz,et al. Time-extended multi-robot coordination for domains with intra-path constraints , 2009, Robotics: Science and Systems.

[43] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[44] Luca Maria Gambardella,et al. A Multiple Ant Colony System for Vehicle Routing Problems with Time Windows , 1999 .

[45] Chih-Ping Wei,et al. Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[46] S. King. Learning to fly. , 1998, Nursing Times.

[47] Nancy Greer,et al. Interventions to Improve Veterans’ Access to Care: A Systematic Review of the Literature , 2011, Journal of General Internal Medicine.

[48] Rakesh Gupta,et al. Improving Hybrid Vehicle Fuel Efficiency Using Inverse Reinforcement Learning , 2012, AAAI.

[49] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[50] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.