New methods of complex matrix factorization for single-channel source separation and analysis

Throughout the day, people are constantly bombarded by a variety of sounds. Humans with normal hearing are able to easily and automatically cut through the noise to focus on the sources of interest, a phenomenon known as the "cocktail party effect.'' This ability, while easy for humans, is typically very challenging for computers. In this dissertation, we will focus on the task of single-channel source separation via matrix factorization, a state-of-the-art family of algorithms. In this work, we present three primary contributions. First, we explore how cost function and parameter choice affect source separation performance, as well as discuss the advantages and disadvantages of each matrix factorization model. Second, we propose a new model, complex matrix factorization with intra-source additivity, that has significant advantages over the current state-of-the-art matrix factorization models. Third, we propose the complex probabilistic latent component analysis algorithm, which can be used to transform complex-valued data into nonnegative data in such a way that the underlying structure in the complex data is preserved. We also show how these new methods can be applied to single-channel source separation and compare them with the current state-of-the-art methods.

[1]  Bhiksha Raj,et al.  A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds , 2009, NIPS.

[2]  Kiyohiro Shikano,et al.  Blind Source Separation Combining Independent Component Analysis and Beamforming , 2003, EURASIP J. Adv. Signal Process..

[3]  Paris Smaragdis,et al.  Noise-robust dynamic time warping using PLCA features , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[5]  Bhiksha Raj,et al.  Adobe Systems , 1998 .

[6]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[7]  Erkki Oja,et al.  Unified Development of Multiplicative Algorithms for Linear and Quadratic Nonnegative Matrix Factorization , 2011, IEEE Transactions on Neural Networks.

[8]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  A. Oppenheim,et al.  Signal reconstruction from phase or magnitude , 1980 .

[10]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[11]  Bhiksha Raj,et al.  Recognizing speech from simultaneous speakers , 2005, INTERSPEECH.

[12]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[13]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Juhan Nam,et al.  Sound Recognition in Mixtures , 2012, LVA/ICA.

[15]  L. Atlas,et al.  Single-Channel Source Separation Using Complex Matrix Factorization , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Hirokazu Kameoka,et al.  Single Channel Speech and Background Segregation Through Harmonic-Temporal Clustering , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[17]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[18]  Laurent Benaroya,et al.  WIENER BASED SOURCE SEPARATION WITH HMM/GMM USING A SINGLE SENSOR , 2003 .

[19]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[20]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[21]  A. Oppenheim,et al.  Iterative techniques for minimum phase signal reconstruction from phase or magnitude , 1980 .

[22]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[23]  Les E. Atlas,et al.  Feasibility of Single Channel Speaker Separation Based on Modulation Frequency Analysis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[25]  Jonathan Le Roux Exploiting regularities in natural acoustical scenes for monaural audio signal estimation, decomposition, restoration and modification , 2009 .

[26]  Andreas Stolcke,et al.  Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Paris Smaragdis,et al.  Missing data imputation for spectral audio signals , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[28]  Thomas F. Quatieri,et al.  An approach to co-channel talker interference suppression using a sinusoidal model for speech , 1990, IEEE Trans. Acoust. Speech Signal Process..

[29]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[30]  Paris Smaragdis,et al.  Relative pitch estimation of multiple instruments , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[32]  Paris Smaragdis,et al.  Optimal cost function and magnitude power for NMF-based speech separation and music interpolation , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[33]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[34]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[35]  Alexey Ozerov,et al.  Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues , 2010, CMMR.

[36]  Derry Fitzgerald,et al.  On the use of the beta divergence for musical source separation , 2009 .

[37]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[38]  Roland Badeau,et al.  Beta-Divergence as a Subclass of Bregman Divergence , 2011, IEEE Signal Processing Letters.

[39]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Jae Lim,et al.  Signal reconstruction from short-time Fourier transform magnitude , 1983 .

[41]  Lucas C. Parra,et al.  Convolutive blind separation of non-stationary sources , 2000, IEEE Trans. Speech Audio Process..

[42]  B. Shinn-Cunningham,et al.  Latent variable framework for modeling and separating single-channel acoustic sources , 2008 .

[43]  Paris Smaragdis,et al.  Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  Barak A. Pearlmutter,et al.  Convolutive Non-Negative Matrix Factorisation with a Sparseness Constraint , 2006 .

[45]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[47]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[48]  Matthew D. Hoffman Poisson-uniform nonnegative matrix factorization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49]  Bhiksha Raj,et al.  Bandwidth expansion of narrowband speech using non-negative matrix factorization , 2005, INTERSPEECH.

[50]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[51]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[52]  S.M. Plis,et al.  Sparse shift-invariant NMF , 2008, 2008 IEEE Southwest Symposium on Image Analysis and Interpretation.

[53]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[54]  Barak A. Pearlmutter,et al.  Convolutive Non-Negative Matrix Factorisation with a Sparseness Constraint , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[55]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[56]  Éric Gaussier,et al.  Relation between PLSA and NMF and implications , 2005, SIGIR '05.

[57]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[58]  J. Eggert,et al.  Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[59]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[60]  Hirokazu Kameoka,et al.  Complex NMF under spectrogram consistency constraints , 2009 .