Smoothing CNN for end-to-end training in visual question answering