Molecule generation using transformers and policy gradient reinforcement learning