To cultivate professional sports referees, we develop a sports referee training system, which can recognize whether a trainee wearing the Myo armband makes correct judging signals while watching a prerecorded professional game. The system has to correctly recognize a set of gestures related to official referee's signals (ORSs) and another set of gestures used to intuitively interact with the system. These two gesture sets involve both large motion and subtle motion gestures, and the existing sensor-based methods using handcrafted features do not work well on recognizing all kinds of these gestures. In this work, deep belief networks (DBNs) are utilized to learn more representative features for hand gesture recognition, and selective handcrafted features are combined with the DBN features to achieve more robust recognition results. Moreover, a hierarchical recognition scheme is designed to first recognize the input gesture as a large or subtle motion gesture, and the corresponding classifiers for large motion gestures and subtle motion gestures are further used to obtain the final recognition result. Moreover, the Myo armband consists of eight-channel surface electromyography (sEMG) sensors and an inertial measurement unit (IMU), and these heterogeneous signals can be fused to achieve better recognition accuracy. We take basketball as an example to validate the proposed training system, and the experimental results show that the proposed hierarchical scheme considering DBN features of multimodality data outperforms other methods.