An adaptive safety layer with hard constraints for safe reinforcement learning in multi-energy management systems