Automated Audio Captioning via Fusion of Low- and High- Dimensional Features