A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks