Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention