Variable sparsity regularization factor based SNMF for monaural speech separation

Factor of sparsity in a speech signal plays an important role in the speech processing. This paper proposed a method in which variable regularization factor of sparsity is applied for the mixed signal and used to separate the monaural speech signals. The sparsity regularization factor for individual training and testing signal was find using particle swarm optimization. Algorithm has been tested for speech-speech separation using TIMIT database and music-speech separation using MIR −1K database. Results are evaluated for fixed and variable regularization factor of sparsity in the cases of both speech-speech and music-speech mixed signal separation. Proposed model has been compared with the existing model based on fixed sparsity factor and has shown superior performance in terms of SDR, SIR and other objective measures.

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[3]  S. C. Neoh,et al.  A Micro-GA Embedded PSO Feature Selection Approach to Intelligent Facial Emotion Recognition , 2017, IEEE Transactions on Cybernetics.

[4]  Haesun Park,et al.  Sparse Nonnegative Matrix Factorization for Clustering , 2008 .

[5]  Jen-Tzung Chien,et al.  Bayesian Factorization and Learning for Monaural Source Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Ke Huang,et al.  Sparse Representation for Signal Classification , 2006, NIPS.

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  T. R. Neelakantan,et al.  Particle Swarm Optimization Compared to Other Heuristic Search Techniques for Pipe Sizing , 2006 .

[9]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[10]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[11]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[13]  Erkki Oja,et al.  Projective Nonnegative Matrix Factorization : Sparseness , Orthogonality , and Clustering , 2009 .

[14]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Jan Larsen,et al.  Bayesian nonnegative Matrix Factorization with volume prior for unmixing of hyperspectral images , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[16]  Zi Wang,et al.  Discriminative non-negative matrix factorization for single-channel speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[18]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[19]  W. Marsden I and J , 2012 .

[20]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  O. Weck,et al.  A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND THE GENETIC ALGORITHM , 2005 .

[22]  Singiresu S. Rao Engineering Optimization : Theory and Practice , 2010 .

[23]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[24]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.