论文信息 - Cyber-Human Approach For Learning Human Intention And Shape Robotic Behavior Based On Task Demonstration - 字舞流文

Cyber-Human Approach For Learning Human Intention And Shape Robotic Behavior Based On Task Demonstration

Recent developments in artificial intelligence enabled training of autonomous robots without human supervision. Even without human supervision during training, current models have yet to be human-engineered and have neither guarantees to match human expectation nor perform within safety bounds. This paper proposes CyberSteer to leverage human-robot interaction and align goals between humans and robotic intelligent agents. Based on human demonstration of the task, CyberSteer learns an intrinsic reward function used by the human demonstrator to pursue the goal of the task. The learned intrinsic human function shapes the robotic behavior during training through deep reinforcement learning algorithms, removing the need for environment-dependent or hand-engineered reward signal. Two different hypotheses were tested, both using non-expert human operators for initial demonstration of a given task or desired behavior: one training a deep neural network to classify human-like behavior and other training a behavior cloning deep neural network to suggest actions. In this experiment, CyberSteer was tested in a high-fidelity unmanned air system simulation environment, Microsoft AirSim. The simulated aerial robot performed collision avoidance through a clustered forest environment using forward-looking depth sensing. The performance of CyberSteer is compared to behavior cloning algorithms and reinforcement learning algorithms guided by handcrafted reward functions. Results show that the human-learned intrinsic reward function can shape the behavior of robotic systems and have better task performance guiding reinforcement learning algorithms compared to standard human-handcrafted reward functions.

Vinicius G. Goecks | William D. Nothwang | Gregory M. Gremillion | Hannah C. Lehman | G. Gremillion | W. Nothwang | Hannah C. Lehman

[1] George D. C. Cavalcanti,et al. Combining dissimilarity spaces for text categorization , 2017, Inf. Sci..

[2] Jitendra Malik,et al. Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[4] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[5] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[6] E. Morales,et al. Human Interaction for Effective Reinforcement Learning , 2013 .

[7] W. Bradley Knox,et al. Learning from human-generated reward , 2012 .

[8] Ahmad Hakimi,et al. Ideal Gas Optimization Algorithm , 2017 .

[9] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[10] Teresa Bernarda Ludermir,et al. Optimization of the weights and asymmetric activation function family of neural network for time series forecasting , 2013, Expert Syst. Appl..

[11] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Ashish Kapoor,et al. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[14] Kevin Barraclough,et al. I and i , 2001, BMJ : British Medical Journal.

[15] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[16] Carl E. Rasmussen,et al. Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Claudia-Adina Dragos,et al. Online identification of evolving Takagi-Sugeno-Kang fuzzy models for crane systems , 2014, Appl. Soft Comput..

[18] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[19] Peter Stone,et al. Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance , 2015, Artif. Intell..

[20] W. Marsden. I and J , 2012 .

[21] Peter Stone,et al. Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.