Non-negative matrix based optimization scheme for blind source separation in automatic speech recognition system

Recently, use of automatic speech recognition system is demanded for various applications such as security, word to text conversion etc. During the speech signal acquisition, other unwanted signals from various sources are added to the original signal which degrades the performance of ASR system. These unwanted signals are called as noise or mixing of sources which are caused due to multi-user recording, echo effect etc. this issues motivates to develop an efficient algorithm for audio demixing or source separation. To address this issue in this work we propose a new approach for source separation method using nonnegative factorization method. Proposed work utilized source mixing signal modelling, filter bank designing and source separation algorithm implementation. Modelling of signal is performed by combining two different channels which are acquired from different source, this signal is called mixture signal. Later a filter bank is designed using scattering algorithm based on wavelet transform method and a optimization problem is formulated for audio demixing. Experimental study shows the robustness of proposed model by considering various implementation scenarios.

[1]  Emmanuel Vincent,et al.  Nonparametric Uncertainty Estimation and Propagation for Noise Robust ASR , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Tuomas Virtanen,et al.  Detection, separation and recognition of speech from continuous signals using spectral factorisation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[3]  Antonio Nucci,et al.  Fuzzy-Clustering-Based Decision Tree Approach for Large Population Speaker Identification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Haizhou Li,et al.  Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  D. O'Shaughnessy Acoustic Analysis for Automatic Speech Recognition , 2013, Proceedings of the IEEE.

[6]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[7]  Paris Smaragdis,et al.  Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Yu Tsao,et al.  An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition , 2013, INTERSPEECH.

[10]  Hatice Gunes,et al.  Bi-modal emotion recognition from expressive face and body gestures , 2007, J. Netw. Comput. Appl..

[11]  Pierre Vandergheynst,et al.  Blind Audiovisual Source Separation Based on Sparse Redundant Representations , 2010, IEEE Transactions on Multimedia.

[12]  Keikichi Hirose,et al.  Localization based audio source separation by sub-band beamforming , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[13]  Qin Zhang,et al.  Noise Reduction Based on Robust Principal Component Analysis , 2014 .

[15]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  John H. L. Hansen,et al.  A Generalized Nonnegative Tensor Factorization Approach for Distant Speech Recognition With Distributed Microphones , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Yifan Gong,et al.  An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.