CNN-Transformer with Self-Attention Network for Sound Event Detection