Single-channel source separation using simplified-training complex matrix factorization

Although the task seems trivial for human listeners, research in automating source separation still lags far behind human performance and is especially difficult for single-channel signals. One of the latest and most promising methods of single-channel source separation is non-negative matrix factorization, which works by synthesizing signals from a learned set of bases for each source. In this paper, we present a new method of creating these learned sets of bases used in the matrix factorization technique for single-channel source separation. This new method does not suffer the complication of choosing an optimal number of bases as in previous methods. In addition, this paper further explores the new method of complex matrix factorization and compares its performance to non-negative, real matrix factorization for automatic speech recognition of two-talker mixtures.

[1]  Irfan A. Essa,et al.  Incorporating Phase Information for Source Separation via Spectrogram Factorization , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  Q. Summerfield Book Review: Auditory Scene Analysis: The Perceptual Organization of Sound , 1992 .

[4]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[5]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[6]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[7]  Andreas Stolcke,et al.  Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[9]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[10]  Terrence J. Sejnowski,et al.  Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.

[11]  Lawrence K. Saul,et al.  Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization , 2004, NIPS.

[12]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[13]  G. Kramer Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .

[14]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  E. Oja,et al.  Independent Component Analysis , 2001 .

[16]  Barak A. Pearlmutter,et al.  Convolutive Non-Negative Matrix Factorisation with a Sparseness Constraint , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[17]  Bhiksha Raj,et al.  Recognizing speech from simultaneous speakers , 2005, INTERSPEECH.

[18]  Radar Establishment HIDDEN MARKOV MODEL DECOMPOSITION OF SPEECH AND NOISE , 1990 .

[19]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Mark J. F. Gales,et al.  An improved approach to the hidden Markov model decomposition of speech and noise , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.