Exploring Train and Test-Time Augmentations for Audio-Language Learning