Learning with Delayed Rewards-A Case Study on Inverse Defect Design in 2D Materials.