Sparsity level in a non-negative matrix factorization based speech strategy in cochlear implants

Non-negative matrix factorization (NMF) has increasingly been used as a tool in signal processing in the last years, but it has not been used in the cochlear implants (CIs). To improve the performance of CIs in noisy environments, a novel sparse strategy is proposed by applying NMF on envelopes of 22 channels. In the new algorithm, the noisy speech is first transferred to the time-frequency domain via a 22- channel filter bank and the envelope in each frequency channel is extracted; secondly, NMF is applied to the envelope matrix (envelopegram); finally, the sparsity condition is applied to the coefficient matrix to get more sparse representation. Speech reception threshold (SRT) subjective experiment was performed in combination with five objective measurements in order to choose the proper parameters for the sparse NMF model.

[1]  Philipos C Loizou,et al.  Speech processing in vocoder-centric cochlear implants. , 2006, Advances in oto-rhino-laryngology.

[2]  John R. Hershey,et al.  Efficient model-based speech separation and denoising using non-negative subspace analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  A. M. Mimpen,et al.  Improving the reliability of testing the speech reception threshold for sentences. , 1979, Audiology : official organ of the International Society of Audiology.

[4]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[5]  Arne Leijon,et al.  A new linear MMSE filter for single channel speech enhancement based on Nonnegative Matrix Factorization , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[6]  Arne Leijon,et al.  Methodology for quantifying perceptual effects from noise suppression systems , 2005, International journal of audiology.

[7]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  James F Patrick,et al.  The Development of the Nucleus® Freedom™ Cochlear Implant System , 2006, Trends in amplification.

[9]  Shengli Xie,et al.  Blind Spectral Unmixing Based on Sparse Nonnegative Matrix Factorization , 2011, IEEE Transactions on Image Processing.

[10]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Guoping Li Speech perception in a sparse domain , 2008 .

[12]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[13]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[14]  Liang Chen,et al.  Enhanced sparse speech processing strategy for cochlear implants , 2011, 2011 19th European Signal Processing Conference.

[15]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[17]  Philipos C Loizou,et al.  The intelligibility of speech with "holes" in the spectrum. , 2002, The Journal of the Acoustical Society of America.

[18]  Steven Greenberg,et al.  Speech Processing in the Auditory System: An Overview , 2004 .

[19]  Andrzej Cichocki,et al.  New Algorithms for Non-Negative Matrix Factorization in Applications to Blind Source Separation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[21]  Fei Chen,et al.  Analysis of a simplified normalized covariance measure based on binary weighting functions for predicting the intelligibility of noise-suppressed speech. , 2010, The Journal of the Acoustical Society of America.

[22]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.