Towards Variance Reduction for Reinforcement Learning of Industrial Decision-making Tasks: A Bi-Critic based Demand-Constraint Decoupling Approach
暂无分享,去创建一个
[1] Biwei Huang,et al. Factored Adaptation for Non-Stationary Reinforcement Learning , 2022, NeurIPS.
[2] Lei Chen,et al. A Data-Driven Column Generation Algorithm For Bin Packing Problem in Manufacturing Industry , 2022, ArXiv.
[3] Kai Xu,et al. Learning practically feasible policies for online 3D bin packing , 2021, Science China Information Sciences.
[4] Wenhao Ding,et al. Context-Aware Safe Reinforcement Learning for Non-Stationary Environments , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).
[5] Scott M. Jordan,et al. Towards Safe Policy Improvement for Non-Stationary MDPs , 2020, NeurIPS.
[6] Reazul Hasan Russel,et al. Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty , 2020, ArXiv.
[7] Christian D. Hubbs,et al. OR-Gym: A Reinforcement Learning Library for Operations Research Problem , 2020, ArXiv.
[8] Yin Yang,et al. Online 3D Bin Packing with Constrained Deep Reinforcement Learning , 2020, AAAI.
[9] Wenhao Ding,et al. Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes , 2020, NeurIPS.
[10] Lingxiao Wang,et al. Optimal Elevator Group Control via Deep Asynchronous Actor–Critic Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.
[11] Sang Wan Lee,et al. Task complexity interacts with state-space uncertainty in the arbitration between model-based and model-free learning , 2019, Nature Communications.
[12] Samir Elhedhli,et al. Three-Dimensional Bin Packing and Mixed-Case Palletization , 2019, INFORMS Journal on Optimization.
[13] Marek Petrik,et al. Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs , 2019, NeurIPS.
[14] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.
[15] Hongzi Mao,et al. Variance Reduction for Reinforcement Learning in Input-Driven Environments , 2018, ICLR.
[16] Lawrence V. Snyder,et al. Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.
[17] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[19] Daniel Nikovski,et al. Submodular Function Maximization for Group Elevator Scheduling , 2017, ICAPS.
[20] Trung Thanh Nguyen,et al. An Online Packing Heuristic for the Three-Dimensional Container Loading Problem in Dynamic Environments and the Physical Internet , 2017, EvoApplications.
[21] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[22] Xueping Li,et al. A hybrid differential evolution algorithm for multiple container loading problem with heterogeneous containers , 2015, Comput. Ind. Eng..
[23] Emmanuel Hadoux,et al. Sequential Decision-Making under Non-stationary Environments via Sequential Change-point Detection , 2014 .
[24] Dong Hui,et al. Research of elevator group scheduling system based on reinforcement learning algorithm , 2013, Proceedings of 2013 2nd International Conference on Measurement, Information and Control.
[25] Edmund K. Burke,et al. An effective heuristic for the two-dimensional irregular bin packing problem , 2013, Annals of Operations Research.
[26] Hongfeng Wang,et al. A hybrid genetic algorithm with a new packing strategy for the three-dimensional bin packing problem , 2012, Appl. Math. Comput..
[27] Teodor Gabriel Crainic,et al. TS2PACK: A two-level tabu search for the three-dimensional bin packing problem , 2009, Eur. J. Oper. Res..
[28] Teodor Gabriel Crainic,et al. Extreme Point-Based Heuristics for Three-Dimensional Bin Packing , 2008, INFORMS J. Comput..
[29] Daniele Vigo,et al. The Three-Dimensional Bin Packing Problem , 2000, Oper. Res..
[30] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[31] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[32] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.
[33] J. O. Berkey,et al. Two-Dimensional Finite Bin-Packing Algorithms , 1987 .
[34] Bernard Chazelle,et al. The Bottomn-Left Bin-Packing Heuristic: An Efficient Implementation , 1983, IEEE Transactions on Computers.
[35] George R. Strakosch,et al. Vertical Transportation: Elevators and Escalators , 1983 .
[36] Kai Xu,et al. Learning Efficient Online 3D Bin Packing on Packing Configuration Trees , 2022, ICLR.
[37] Anton Jansson,et al. Elevator Control Using Reinforcement Learning to Select Strategy , 2015 .
[38] Risto Lahdelma,et al. Estimated Time of Arrival ( ETA ) Based Elevator Group Control Algorithm with More Accurate Estimation , 2004 .
[39] Michael L. Littman,et al. Exact Solutions to Time-Dependent MDPs , 2000, NIPS.