Reducing Audio Noise Using Spectrogram Random Textures

This paper discusses audio enhancement when a strong, additive noise is present only during a known or easily detected period of moderate length (of around one second). The signals may contain intelligible components such as speech or music, and may also contain desired, but unintelligible, background components such as rivers or waterfalls. A first estimate synthesizes the unintelligible components from the noise-free neighboring spectrogram. A second estimate recovers the intelligible components using spectral attenuation. The two estimates are combined using ideas from statistical process control. Tests with audio containing digital camera zoom motor noise, and with simulations, validate the approach