Implementation details

Bootstrapping: For the bootstrapping experiment, we use an LSTM [16] with 256 hidden units as meta network. The inputs to the meta network include reward Rt+1, discount γt+1, value from the next state v(St+1), which were fed in reverse-time order into the LSTM. The agent network is a two layer MLP with 256 hidden units. Both agent optimiser and the meta optimiser are RMSProp [35], where learning rate is 1e−3 and meta learning rate is 1e−4. M = 5 inner updates are performed to the agent then the metagradient is obtained. We perform 10 independent runs with 10 different random seeds and report average performance with standard deviation.

[1]  M. Balinski,et al.  On an Integer Program for a Delivery Problem , 1964 .

[2]  E. Balas,et al.  Set Partitioning: A survey , 1976 .

[3]  S. Albers Implicit enumeration algorithms for the Set-Partitioning Problem , 1980 .

[4]  Thomas L. Magnanti,et al.  Combinatorial optimization and vehicle fleet planning: Perspectives and prospects , 1981, Networks.

[5]  L. Bodin ROUTING AND SCHEDULING OF VEHICLES AND CREWS–THE STATE OF THE ART , 1983 .

[6]  Gilbert Laporte,et al.  Vehicle routing with full loads , 1985, Comput. Oper. Res..

[7]  Jacques Desrosiers,et al.  The Pickup and Delivery Problem with Time Windows , 1989 .

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  Jan Karel Lenstra,et al.  Variable-Depth Search for the Single-Vehicle Pickup and Delivery Problem with Time Windows , 1993, Transp. Sci..

[10]  Nicos Christofides,et al.  Algorithms for large scale set covering problems , 1993, Ann. Oper. Res..

[11]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Martin W. P. Savelsbergh,et al.  The General Pickup and Delivery Problem , 1995, Transp. Sci..

[13]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[14]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[15]  Martin W. P. Savelsbergh,et al.  Drive: Dynamic Routing of Independent Vehicles , 1998, Oper. Res..

[16]  J. Wesley Barnes,et al.  Solving the Pickup and Delivery Problem with Time Windows Using Reactive Tabu Search Transportation , 2000 .

[17]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[18]  Pickup and delivery with time windows: algorithms and test case generation , 2001, Proceedings 13th IEEE International Conference on Tools with Artificial Intelligence. ICTAI 2001.

[19]  Paul A. Beardsley,et al.  A self-correcting projector , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[21]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[22]  Richard F. Hartl,et al.  New savings based algorithms for time constrained pickup and delivery of full truckloads , 2003, Eur. J. Oper. Res..

[23]  Adam Dunkels,et al.  Full TCP/IP for 8-bit architectures , 2003, MobiSys '03.

[24]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[25]  Maged M. Dessouky,et al.  An Exact Algorithm for the Multiple Vehicle Pickup and Delivery Problem , 2004, Transp. Sci..

[26]  Jian Yang,et al.  Real-Time Multivehicle Truckload Pickup and Delivery Problems , 2004, Transp. Sci..

[27]  Joaquim Salvi,et al.  Pattern codification strategies in structured light systems , 2004, Pattern Recognit..

[28]  Cathy Macharis,et al.  Opportunities for OR in intermodal freight transport research: A review , 2004, Eur. J. Oper. Res..

[29]  Saïd Salhi,et al.  A Tabu Search Heuristic for a Full-Load, Multi-Terminal, Vehicle Scheduling Problem with Backhauling and Time Windows , 2004, J. Math. Model. Algorithms.

[30]  Werner Jüptner,et al.  Accurate procedure for the calibration of a structured light system , 2004 .

[31]  Jizhong Zhou,et al.  Empirical Establishment of Oligonucleotide Probe Design Criteria , 2005, Applied and Environmental Microbiology.

[32]  Maged M. Dessouky,et al.  A new insertion-based construction heuristic for solving the pickup and delivery problem with time windows , 2006, Eur. J. Oper. Res..

[33]  Peisen S. Huang,et al.  Novel method for structured light system calibration , 2006 .

[34]  Gilbert Laporte,et al.  Models and branch-and-cut algorithms for pickup and delivery problems with time windows , 2007 .

[35]  Akio Imai,et al.  A Lagrangian relaxation-based heuristic for the vehicle routing with full container load , 2007, Eur. J. Oper. Res..

[36]  Richard F. Hartl,et al.  A survey on pickup and delivery problems , 2008 .

[37]  Richard F. Hartl,et al.  A survey on pickup and delivery problems , 2008 .

[38]  Pieter B. T. Neerincx,et al.  Methods for interpreting lists of affected genes obtained in a DNA microarray experiment , 2009, BMC proceedings.