Blind Source Separation and Identification for Speech Signals

Background noise reduction has been studied for many years. However, unwanted human speech noise suppression is not well discussed due to sparsity of the speech signal. Traditional blind source separation (BSS) methods such as independent component analysis (ICA) assume the prior knowledge of the number of sources and require that the number of sources must equal the number of sensors. Above limitations prevent the practical use of speech enhancement using traditional BSS for mobile phone communication. In this paper, a combination method of BSS and speaker recognition system (SRS) is developed for target speech extraction in underdetermined cases. By estimating each independent speech from speech mixture using binary mask over time-frequency (T- F) domain, clean speeches can be separated. By comparing Mel frequency cepstral coefficients (MFCC) of each separated clean speech with the trained MFCC, distortions can be calculated out. The separated clean speech with the smallest distortion is regarded as the target speech. Through a series of validations, optimum parameters for BSS and SRS are obtained. Additionally, the proposed method shows robustness in human-generated background noise suppression.

[1]  Takashi Isa,et al.  A method for solving the permutation problem of frequency-domain BSS using reference signal , 2006, 2006 14th European Signal Processing Conference.

[2]  Wenyi Zhang Microphone array processing for speech : dual channel localization, robust beamforming, and ICA analysis , 2010 .

[3]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[4]  Hidekazu Fukai,et al.  A method to solve the permutation problem in blind source deconvolution for audio signals based on phase linearity estimation , 2016, 2016 IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM).

[5]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[6]  DeLiang Wang,et al.  Two-Microphone Separation of Speech Mixtures , 2008, IEEE Transactions on Neural Networks.

[7]  Alberto E.A. Ferreira,et al.  Real-time blind source separation system with applications to distant speech recognition , 2016 .

[8]  Ala Eldin Omer Joint MFCC-and-vector quantization based text-independent speaker recognition system , 2017, 2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE).

[9]  Masashi Sugiyama,et al.  Noise adaptive optimization of matrix initialization for frequency-domain independent component analysis , 2013, Digit. Signal Process..

[10]  Walter Kellermann,et al.  BSS for improved interference estimation for Blind speech signal Extraction with two microphones , 2009, 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[11]  Ning Wang,et al.  Robust speaker recognition based on multi-stream features , 2016, 2016 IEEE International Conference on Consumer Electronics-China (ICCE-China).

[12]  Erkki Oja,et al.  Independent Component Analysis , 2001 .