Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection