A Modified Algorithm for Multiple Input Spectrogram Inversion

We propose a new algorithm to estimate the phase of speech signal in the mixture of audio sources under the assumption that the magnitude spectrum of each source is given. The previous method, multiple input spectrogram inversion algorithm (MISI), often performs poorly when the magnitude spectrograms estimated are not accurate. This may be because it imposes a strict constraint that the summation of source waveforms should be exactly the same as the mixture waveform. Our proposing algorithm employs a new objective function in which this constraint is relaxed. In this objective function, the difference between the summation of source waveforms and the mixture waveform is the target to be minimized. The performance of our method, modified MISI is evaluated on two different experimental settings. In both settings it improves the audio source separation performance compared to MISI.

[1]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[2]  Jonathan Le Roux,et al.  Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.

[3]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[4]  Zhong-Qiu Wang,et al.  End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction , 2018, INTERSPEECH.

[5]  Nicolas Sturmel,et al.  Informed Source Separation Using Iterative Reconstruction , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[7]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  M.E. Davies,et al.  Source separation using single channel ICA , 2007, Signal Process..

[10]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[11]  Allan Kardec Barros,et al.  Independent Component Analysis and Blind Source Separation , 2007, Signal Processing.

[12]  Dong Yu,et al.  Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[14]  Jonathan Le Roux,et al.  Consistent Wiener Filtering for Audio Source Separation , 2013, IEEE Signal Processing Letters.

[15]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[16]  Deep Sen,et al.  Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures , 2010, IEEE Signal Processing Letters.