Unsupervised Imitation Learning

We introduce a novel method to learn a policy from unsupervised demonstrations of a process. Given a model of the system and a set of sequences of outputs, we find a policy that has a comparable performance to the original policy, without requiring access to the inputs of these demonstrations. We do so by first estimating the inputs of the system from observed unsupervised demonstrations. Then, we learn a policy by applying vanilla supervised learning algorithms to the (estimated)input-output pairs. For the input estimation, we present a new adaptive linear estimator (AdaL-IE) that explicitly trades-off variance and bias in the estimation. As we show empirically, AdaL-IE produces estimates with lower error compared to the state-of-the-art input estimation method, (UMV-IE) [Gillijns and De Moor, 2007]. Using AdaL-IE in conjunction with imitation learning enables us to successfully learn control policies that consistently outperform those using UMV-IE.

[1]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[2]  Georgios Anagnostou,et al.  Derivative-Free Kalman Filtering Based Approaches to Dynamic State Estimation for Power Systems With Unknown Inputs , 2018, IEEE Transactions on Power Systems.

[3]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[4]  Hans-Andrea Loeliger,et al.  LMMSE Estimation and Interpolation of Continuous-Time Signals from Discrete-Time Samples Using Factor Graphs , 2013, ArXiv.

[5]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[6]  Jean-François Rigal,et al.  High Frequency Correction of Dynamometer for Cutting Force Observation in Milling , 2010 .

[7]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[8]  Gene F. Franklin,et al.  Digital control of dynamic systems , 1980 .

[9]  Mohamed Medhat Gaber,et al.  Imitation Learning , 2017, ACM Comput. Surv..

[10]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[11]  Ron J. Patton,et al.  Input Observability and Input Reconstruction , 1998, Autom..

[12]  Yusuf Altintas,et al.  Dynamic Compensation of Spindle Integrated Force Sensors With Kalman Filter , 2004 .

[13]  Peter K. Kitanidis,et al.  Unbiased minimum-variance linear state estimation , 1987, Autom..

[14]  A. Juditsky,et al.  Large Deviations of Vector-valued Martingales in 2-Smooth Normed Spaces , 2008, 0809.0813.

[15]  G. Lombaert,et al.  A smoothing algorithm for joint input-state estimation in structural dynamics , 2018 .

[16]  Kfir Y. Levy,et al.  k*-Nearest Neighbors: From Global to Local , 2017, NIPS.

[17]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[18]  Martin J. Corless,et al.  State and Input Estimation for a Class of Uncertain Systems , 1998, Autom..

[19]  Chien-Shu Hsieh,et al.  Implementation issues of unbiased minimum-variance state estimation for systems with unknown inputs , 2014, 2014 CACS International Automatic Control Conference (CACS 2014).

[20]  Hieu Minh Trinh,et al.  State and input simultaneous estimation for a class of nonlinear systems , 2004, Autom..

[21]  Bart De Moor,et al.  Unbiased minimum-variance input and state estimation for linear discrete-time systems , 2007, Autom..

[22]  Emilio Frazzoli,et al.  A unified filter for simultaneous input and state estimation of linear discrete-time stochastic systems , 2013, Autom..

[23]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[25]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[26]  Yixin Diao,et al.  Feedback Control of Computing Systems , 2004 .

[27]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[28]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[29]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[30]  Yannick Schroecker,et al.  Imitating Latent Policies from Observation , 2018, ICML.