AUDIO-VISUAL SCENE CLASSIFICATION USING TRANSFER LEARNING AND HYBRID FUSION STRATEGY Technical Report