Model-Based Audio Source Separation

Most audio signals are mixtures of several audio sources which are active simultaneously. Audio source separation is the problem of recovering each source signal from a given mixture signal. Historically, audio source separation systems relied on beamforming algorithms, which do not require any prior knowledge about the source signals and can be applied whenever the mixture is recorded from a set of microphones with known relative positions. Their performance is often very good when the number of microphones is large, but it decreases quickly when the number of microphones is small. An alternative approach is to rely on models of the source signals to make better use of the available information. Existing models rely on some form of independence of the sources along with other assumptions. This report provides a tutorial review of model-based audio source separation algorithms, focusing on situations where the number of mixture channels is limited and possibly smaller than the number of sources. We highlight the exact assumptions made by each algorithm and discuss their validity and limitations for real-world audio signals. To this aim, approaches relating to different historical viewpoints are interpreted within a general statistical framework. We do not discuss implementation issues, but provide bibliographical and software references for more details.

[1]  B Kollmeier,et al.  Real-time multiband dynamic compression and noise reduction for binaural hearing aids. , 1993, Journal of rehabilitation research and development.

[2]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Daniel P. W. Ellis,et al.  Decoding speech in the presence of other sources , 2005, Speech Commun..

[4]  Rémi Gribonval,et al.  Oracle estimators for the benchmarking of source separation algorithms , 2007, Signal Process..

[5]  P. Philippe,et al.  One microphone singing voice separation using source-adapted models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[6]  Nikolaos Mitianoudis,et al.  Audio source separation of convolutive mixtures , 2003, IEEE Trans. Speech Audio Process..

[7]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[8]  Mark R. Every,et al.  Separation of musical sources and structure from single-channel polyphonic recordings , 2006 .

[9]  Emmanuel Vincent,et al.  Musical source separation using time-frequency source priors , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Tuomas Virtanen,et al.  Separation of sound sources by convolutive sparse coding , 2004, SAPA@INTERSPEECH.

[11]  Christopher V. Alvino,et al.  Geometric source separation: merging convolutive source separation with geometric beamforming , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[12]  Carlos Avendano,et al.  Frequency Domain Techniques for Stereo to Multichannel Upmix , 2002 .

[13]  Irfan A. Essa,et al.  Estimating the Spatial Position of Spectral Components in Audio , 2006, ICA.

[14]  Justinian P. Rosca,et al.  Convolutive Demixing with Sparse Discrete Prior Models for Markov Sources , 2006, ICA.

[15]  中谷 智広 Computational auditory scene analysis based on residue-driven architecture and its application to mixed speech recognition , 2002 .

[16]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006, IEEE Trans. Neural Networks.

[17]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[18]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[19]  Shankar Vembu,et al.  Separation of Vocals from Polyphonic Audio Recordings , 2005, ISMIR.

[20]  Hiroshi Sawada,et al.  Evaluation of separation and dereverberation performance in frequency domain blind source separation , 2004 .

[21]  DeLiang Wang,et al.  Pitch-based monaural segregation of reverberant speech. , 2006, The Journal of the Acoustical Society of America.

[22]  Hiroshi Sawada,et al.  Blind Source Separation for MOving Speech Signals Using Blockwise ICA and Residual Crosstalk Subtraction , 2004 .

[23]  Jean-François Cardoso,et al.  Multidimensional independent component analysis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[24]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[25]  John R. Hershey,et al.  Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system , 2006, INTERSPEECH.

[26]  Kunio Kashino,et al.  Application of the Bayesian probability network to music scene analysis , 1998 .

[27]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[28]  DeLiang Wang,et al.  Binaural tracking of multiple moving sources , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[29]  Hiroshi G. Okuno,et al.  Note Recognition of Polyphonic Music by Using Timbre Similarity and Direction Proximity , 2003, ICMC.

[30]  T. Sikora,et al.  On the Use of Auditory Representations for Sparsity-Based Sound Source Separation , 2005, 2005 5th International Conference on Information Communications & Signal Processing.

[31]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[32]  Mitchel Weintraub,et al.  A theory and computational model of auditory monaural sound separation , 1985 .

[33]  Dennis R. Morgan,et al.  A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[35]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[36]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[37]  Christian Jutten,et al.  Space or time adaptive signal processing by neural network models , 1987 .

[38]  Barak A. Pearlmutter,et al.  Survey of sparse and non‐sparse methods in source separation , 2005, Int. J. Imaging Syst. Technol..

[39]  P M Zurek,et al.  Evaluation of an adaptive beamforming method for hearing aids. , 1992, The Journal of the Acoustical Society of America.

[40]  L. Vielva,et al.  UNDERDETERMINED BLIND SOURCE SEPARATION USING A PROBABILISTIC SOURCE SPARSITY MODEL , 2001 .

[41]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[42]  Noboru Murata,et al.  An Approach to Blind Source Separation of Speech Signals , 1998 .

[43]  Daniel P. W. Ellis,et al.  Multi-channel source separation by factorial HMMs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[44]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[45]  Harald Viste,et al.  On the Use of Spatial Cues to Improve Binaural Source Separation , 2003 .

[46]  Johannes Nix,et al.  Localization and separation of concurrent talkers based on principles of auditory scene analysis and multi-dimensional statistical methods , 2006 .

[47]  Tuomas Virtanen,et al.  Sound Source Separation Using Sparse Coding with Temporal Continuity Objective , 2003, ICMC.

[48]  Brian Gygi,et al.  Spectral-temporal factors in the identification of environmental sounds. , 2004, The Journal of the Acoustical Society of America.

[49]  Lucas C. Parra,et al.  On-line Blind Source Separation of Non-Stationary Signals , 2001 .

[50]  Chaz Yee Toh,et al.  Effects of reverberation on perceptual segregation of competing voices. , 2003, The Journal of the Acoustical Society of America.

[51]  Scott Rickard,et al.  ROBUSTNESS OF PARAMETRIC SOURCE DEMIXING IN ECHOIC ENVIRONMENTS , 2001 .

[52]  Noboru Ohnishi,et al.  A survey of the performance indexes of ICA algorithms , 2002 .

[53]  Kari Torkkola,et al.  Blind separation of convolved sources based on information maximization , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[54]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[55]  Shoko Araki,et al.  Equivalence between Frequency-Domain Blind Source Separation and Frequency-Domain Adaptive Beamforming for Convolutive Mixtures , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Hiroshi Sawada,et al.  A robust and precise method for solving the permutation problem of frequency-domain blind source separation , 2004, IEEE Transactions on Speech and Audio Processing.

[57]  Minje Kim,et al.  Monaural Music Source Separation: Nonnegativity, Sparseness, and Shift-Invariance , 2006, ICA.

[58]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[59]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[60]  Dorothea Kolossa,et al.  Nonlinear Postprocessing for Blind Speech Separation , 2004, ICA.

[61]  A probabilistic approach for blind source separation of underdetermined convolutive mixtures , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[62]  Rémi Gribonval,et al.  Non negative sparse representation for Wiener based source separation with a single sensor , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[63]  Walter Kellermann,et al.  Blind Source Separation for Convolutive Mixtures: A Unified Treatment , 2004 .

[64]  Aggelos K. Katsaggelos,et al.  Sound source separation via computational auditory scene analysis-enhanced beamforming , 2002, Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002.

[65]  Hiroshi Sawada,et al.  Overcomplete BSS for Convolutive Mixtures Based on Hierarchical Clustering , 2004, ICA.

[66]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[67]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[68]  Douglas L. Jones,et al.  Performance of time- and frequency-domain binaural beamformers based on recorded signals from real rooms. , 2004, The Journal of the Acoustical Society of America.

[69]  David K. Mellinger,et al.  Event formation and separation in musical sound , 1992 .

[70]  Harald Viste,et al.  An extension for source separation techniques avoiding beats , 2002 .

[71]  Justinian P. Rosca,et al.  Generalized sparse signal mixing model and application to noisy blind source separation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[72]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[73]  Shuichi Sakai,et al.  Musical sound source identification based on frequency com-ponent adaptation , 1999, IJCAI 1999.

[74]  Kiyohiro Shikano,et al.  Real-Time Implementation of Two-Stage Blind Source Separation Combining SIMO-ICA and Binary Masking , 2005 .

[75]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.

[76]  Walter Kellermann,et al.  Frequency-domain integration of acoustic echo cancellation and a generalized sidelobe canceller with improved robustness , 2002, Eur. Trans. Telecommun..