ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywood Movies

The MediaEval 2012 Aect Task challenged participants to automatically nd violent scenes in a set of Hollywood movies. We propose to rst predict a set of mid-level concept annotations from low-level visual and auditory features, then fuse the concept predictions and features to detect violent content. Instead of engineering features suitable for the task, we deliberately restrict ourselves to simple generalpurpose features with limited temporal context and a generic neural network classier,