Self-balancing exoskeleton is a robot that can be worn by paraplegic patients for walking without external support. The exoskeleton is similar to humanoid robot with two normal human feet, which cause inconvenience to existing control methods. These model-based methods require mathematical modeling and usually do not consider the impact of the controller like limits of position and torque. This paper proposes a deep reinforce learning (DRL) based training framework for self-balancing exoskeleton, for the purpose of learning a feasible walking policy which can generate walk pattern similar to the given reference motion for paraplegia with exoskeleton. The framework is based on policy gradient descent algorithm, and a way to simplify the model is proposed to speed up model training. The experimental results indicate that this training framework which fully considers the controller can acquire a feasible control policy for self-balancing exoskeleton.