Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization