Sound Source Separation by Instantaneous Estimation-Based Spectral Subtraction

This project aims to achieve sound source separation based on the two-dimensional fast Fourier transform (2D FFT) of a spatio-temporal sound pressure distribution image consisting of the outputs of a microphone array. The target sound, which arrives from the front of the array, forms vertical stripes in the image. Therefore, its spectral components are perfectly localized as direct current (DC) components along the spatial frequency axis in the 2D-FFT spectrum. In this study, noise suppression was performed by spectral subtraction after the DC components of noise were instantaneously estimated from the spectrum using artificial neural networks. As a result, the performance of the proposed method with a 14-cm-long array was comparable to that of the conventional delay and sum beamformer method with an approximately 5-m-long array.