Reinforcement learning in discrete action space applied to inverse defect design