The shifted inverse-gamma model for noise-floor estimation in archived audio recordings

In this paper a new model for audio signals in additive noise is presented in a time-frequency formulation. It is assumed that signal and noise coefficients are both complex Gaussian random variables, but that the (unknown) variance of the signal component is scaled relative to the (also unknown) noise variance. Under this assumption we find that an appropriate prior distribution for the unknown scalings of signal coefficient variances relative to noise variance can be specified in terms of a shifted inverse-gamma distribution. Incorporating this prior distribution into a Bayesian model, the marginal conditional distribution for the noise variance may be computed in closed form using just tabulated values of the incomplete gamma function, which is readily available in mathematical programming languages. We test our method using both simulated and real noise environments, demonstrating successful and promising results under quite challenging conditions.

[1]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[2]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[3]  Simon J. Godsill,et al.  Bayesian modelling of time-frequency coefficients for audio signal enhancement , 2003 .

[4]  M. West On scale mixtures of normal distributions , 1987 .

[5]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[6]  S. Godsill,et al.  Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement , 2001, Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No.01TH8563).

[7]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[8]  Peter J. W. Rayner,et al.  Digital Audio Restoration: A Statistical Model Based Approach , 1998 .

[9]  Richard Heusdens,et al.  Tracking of Nonstationary Noise Based on Data-Driven Recursive Noise Power Estimation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[11]  Simon J. Godsill,et al.  Perceptually Motivated Approaches to Music Restoration , 2001 .

[12]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[13]  Peter Jax,et al.  A psychoacoustic approach to combined acoustic echo cancellation and noise reduction , 2002, IEEE Trans. Speech Audio Process..

[14]  David Malah,et al.  Speech enhancement using optimal non-linear spectral amplitude estimation , 1983, ICASSP.

[15]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[16]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[17]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[18]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[19]  Simon J. Godsill,et al.  Statistical Model-Based Approaches to Audio Restoration and Analysis , 2001 .

[20]  S. Godsill,et al.  Bayesian variable selection and regularization for time–frequency surface estimation , 2004 .

[21]  Simon J. Godsill,et al.  Efficient Alternatives to the Ephraim and Malah Suppression Rule for Audio Signal Enhancement , 2003, EURASIP J. Adv. Signal Process..