Audio-Visual Fusion Layers for Event Type Aware Video Recognition