Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization

We propose a saliency-maximized audio spectrogram as a representation that lets human analysts quickly search for and detect events in audio recordings. By rendering target events as visually salient patterns, this representation minimizes the time and effort needed to examine a recording. In particular, we propose a transformation of a conventional spectrogram that maximizes the mutual information between the spectrograms of isolated target events and the estimated saliency of the overall visual representation. When subjects are shown spectrograms that are saliency-maximized, they perform significantly better in a 1/10-real-time acoustic event detection task.