Emergent spatio-temporal multimodal learning using a developmental network