Revisiting End-to-End Speech-to-Text Translation From Scratch