Hierarchical Spectro-Temporal Models for Speech Recognition