Apprenticeship Scheduling: Learning to Schedule from Human Experts

Coordinating agents to complete a set of tasks with intercoupled temporal and resource constraints is computationally challenging, yet human domain experts can solve these difficult scheduling problems using paradigms learned through years of apprenticeship. A process for manually codifying this domain knowledge within a computational framework is necessary to scale beyond the "single-expert, single-trainee" apprenticeship model. However, human domain experts often have difficulty describing their decision-making processes, causing the codification of this knowledge to become laborious. We propose a new approach for capturing domain-expert heuristics through a pairwise ranking formulation. Our approach is model-free and does not require enumerating or iterating through a large state-space. We empirically demonstrate that this approach accurately learns multifaceted heuristics on both a synthetic data set incorporating jobshop scheduling and vehicle routing problems and a real-world data set consisting of demonstrations of experts solving a weapon-to-target assignment problem.

[1]  Maksims Volkovs,et al.  BoltzRank: learning to maximize expected ranking gain , 2009, ICML '09.

[2]  René David,et al.  Discrete event dynamic systems , 1989 .

[3]  Ronald G. Askin,et al.  Project selection, scheduling and resource allocation with time dependent returns , 2009, Eur. J. Oper. Res..

[4]  Yi-Chi Wang,et al.  Application of reinforcement learning for agent-based production scheduling , 2005, Eng. Appl. Artif. Intell..

[5]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Hamsa Balakrishnan,et al.  Estimation of maximum-likelihood discrete-choice models of the runway configuration selection process , 2011, Proceedings of the 2011 American Control Conference.

[8]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Hang Li,et al.  Ranking refinement and its application to information retrieval , 2008, WWW.

[11]  Manuela M. Veloso,et al.  Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[12]  Siyuan Liu,et al.  Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise , 2014, AAAI.

[13]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[14]  Steven D. Pizer,et al.  What Are the Consequences of Waiting for Health Care in the Veteran Population? , 2011, Journal of General Internal Medicine.

[15]  Nicola Muscettola,et al.  Reformulating Temporal Plans for Efficient Execution , 1998, KR.

[16]  Neil Yorke-Smith,et al.  PTIME: Personalized assistance for calendaring , 2011, TIST.

[17]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[18]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[19]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[20]  WangYi-Chi,et al.  Application of reinforcement learning for agent-based production scheduling , 2005 .

[21]  J. Ginzburg,et al.  Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue , 2012 .

[22]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[23]  Han-Lim Choi,et al.  Consensus-Based Auction Approaches for Decentralized Task Assignment , 2008 .

[24]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[25]  Julie A. Shah,et al.  Fast Scheduling of Multi-Robot Teams with Temporospatial Constraints , 2013, Robotics: Science and Systems.

[26]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[27]  Chou-Yuan Lee,et al.  Efficiently solving general weapon-target assignment problem by genetic algorithms with greedy eugenics , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[28]  L. Goddard,et al.  Operations Research (OR) , 2007 .

[29]  Anthony Stentz,et al.  A comprehensive taxonomy for multi-robot task allocation , 2013, Int. J. Robotics Res..

[30]  Bilge Mutlu,et al.  Learning-Based Modeling of Multimodal Behaviors for Humanlike Robots , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[31]  Dylan A. Shell,et al.  Optimal Market-based Multi-Robot Task Allocation via Strategic Pricing , 2013, Robotics: Science and Systems.

[32]  T. Salakoski,et al.  Learning to Rank with Pairwise Regularized Least-Squares , 2007 .

[33]  Marius M. Solomon,et al.  Algorithms for the Vehicle Routing and Scheduling Problems with Time Window Constraints , 1987, Oper. Res..

[34]  Loo Hay Lee,et al.  Heuristic methods for vehicle routing problem with time windows , 2001, Artif. Intell. Eng..

[35]  Allison Sauppé,et al.  A Regression-based Approach to Modeling Addressee Backchannels , 2012, SIGDIAL Conference.

[36]  Hema Raghavan,et al.  Active Learning with Feedback on Features and Instances , 2006, J. Mach. Learn. Res..

[37]  Ernesto Nunes,et al.  Multi-Robot Auctions for Allocation of Tasks with Temporal Constraints , 2015, AAAI.

[38]  Pengcheng Zhang,et al.  A novel multi-agent reinforcement learning approach for job scheduling in Grid computing , 2011, Future Gener. Comput. Syst..

[39]  Julie A. Shah,et al.  Schedulability Analysis of Task Sets with Upper- and Lower-Bound Temporal Constraints , 2014, J. Aerosp. Inf. Syst..

[40]  Sriraam Natarajan,et al.  Active Advice Seeking for Inverse Reinforcement Learning , 2015, AAAI.

[41]  Rong Jin,et al.  Learning to Rank by Optimizing NDCG Measure , 2009, NIPS.

[42]  Anthony Stentz,et al.  Time-extended multi-robot coordination for domains with intra-path constraints , 2009, Robotics: Science and Systems.

[43]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[44]  Luca Maria Gambardella,et al.  A Multiple Ant Colony System for Vehicle Routing Problems with Time Windows , 1999 .

[45]  Chih-Ping Wei,et al.  Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[46]  S. King Learning to fly. , 1998, Nursing Times.

[47]  Nancy Greer,et al.  Interventions to Improve Veterans’ Access to Care: A Systematic Review of the Literature , 2011, Journal of General Internal Medicine.

[48]  Rakesh Gupta,et al.  Improving Hybrid Vehicle Fuel Efficiency Using Inverse Reinforcement Learning , 2012, AAAI.

[49]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[50]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.