论文信息 - Large-Scale Online Multitask Learning and Decision Making for Flexible Manufacturing

Large-Scale Online Multitask Learning and Decision Making for Flexible Manufacturing

Large-scale machine coordination is a primary approach for flexible manufacturing, enabling large-scale autonomous machines to dynamically coordinate their actions in pursuit of a custom task. One of the key challenges for such large-scale systems is finding high-dimensional coordination decision-making policies. Multitask policy gradient algorithms can be used in search of high-dimensional policies, particularly in collaborative decision support systems and distributed control systems. However, it is difficult for these algorithms to learn online high-dimensional coordination control policies (CCP) from large-scale custom manufacturing tasks. This paper proposes a large-scale online multitask learning and decision-making approach, which can consecutively learn high-dimensional CCP in order to quickly coordinate machine actions online for large-scale custom manufacturing task. A large-scale online multitask leaning algorithm is developed, which is able to learn large-scale high-dimensional CCP in a flexible manufacturing scenario. An online stochastic planning algorithm is proposed, which online optimizes the Markov network structure in order to avoid expensive global search for the optimal policy. Experiments have been undertaken using a professional flexible manufacturing testbed deployed within a smart factory of Weichai Power in China. Results show the proposed approach to be more efficient when compared with previous works.

[1] Kagermann Henning. Recommendations for implementing the strategic initiative INDUSTRIE 4.0 , 2013 .

[2] Fredrik Danielsson,et al. P-SOP - A multi-agent based control approach for flexible and robust manufacturing , 2015 .

[3] Anita Raja,et al. Coordinating decentralized learning and conflict resolution across agent boundaries , 2012 .

[4] Manuela M. Veloso,et al. Learning domain structure through probabilistic policy reuse in reinforcement learning , 2013, Progress in Artificial Intelligence.

[5] Vivek F. Farias,et al. Approximate Dynamic Programming via a Smoothed Linear Program , 2009, Oper. Res..

[6] Hui Li,et al. Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[7] Rémi Munos,et al. Optimistic Planning in Markov Decision Processes Using a Generative Model , 2014, NIPS.

[8] Eric Eaton,et al. Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[9] Eric Eaton,et al. ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[10] Peter L. Bartlett,et al. Linear Programming for Large-Scale Markov Decision Problems , 2014, ICML.

[11] Victor R. Lesser,et al. Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[12] Giancarlo Fortino,et al. A Java-Based Agent Platform for Programming Wireless Sensor Networks , 2011, Comput. J..

[13] Thilo Sauter,et al. Functional Analysis of Manufacturing Execution System Distribution , 2011, IEEE Transactions on Industrial Informatics.

[14] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[15] Victor R. Lesser,et al. Multiagent meta-level control for radar coordination , 2013, Web Intell. Agent Syst..

[16] Peter Englert,et al. Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[17] Min Chen,et al. A Survey on Internet of Things From Industrial Market Perspective , 2015, IEEE Access.

[18] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[19] David S. Leslie,et al. Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..