An Efficient Feature Fusion of Graph Convolutional Networks and Its Application for Real-Time Traffic Control Gestures Recognition

Recently, skeleton-based gesture and action recognition have emerged thanks to the progress in human pose estimation. Gesture representation using skeletal data is robust since skeletal data are invariant to the individual’s appearance. Among different approaches proposed for skeleton-based action/gesture recognition, Graph Convolutional Network (GCN) and its variations have obtained great attention thanks to its ability to capture the graph essence of the skeletal data. In this paper, we aim to design an efficient scheme using relative joints of skeleton sequences adapted in a GCN framework. Both spatial features (i.e., joint positions) and temporal ones (i.e., the velocity of joints) are combined to form the input of Attention-enhanced Adaptive GCN (AAGCN). The proposed framework deals with limitations of the original AAGCN when it works on challenging datasets with incomplete and noisy skeletal data. Extensive experiments are carried out on three datasets CMDFALL, MICA-Action3D, NTU-RGBD. Experimental results show that the proposed method achieves superior performance compared with existing methods. Moreover, to illustrate the application of the proposed method in real-time traffic control gesture recognition for autonomous vehicles, we have evaluated the proposed method on the TCG dataset. The obtained results show that the proposed method offers real-time computation capability and good recognition results. These results suggest a promising solution to deploy a real-time and robust recognition technique for gesture-based traffic control in autonomous vehicles.