Discovering the Optimal Setup for Speech Emotion Recognition Model Incorporating Different CNN Architectures