A Reinforcement Learning Approach for Energy Efficient Beamforming in NOMA Systems