Skeletal Keypoint-Based Transformer Model for Human Action Recognition in Aerial Videos