Reverberant Audio Blind Source Separation via Local Convolutive Independent Vector Analysis

In this paper, we propose a new formulation for the blind source separation problem for audio signals with convolutive mixtures to improve the separation performance of Independent Vector Analysis (IVA). The proposed method benefits from both the recently investigated convolutive approximation model and the IVA approaches that take advantages of the cross-band information to avoid permutation alignment. We first exploit the link between the IVA and the Sparse Component Analysis (SCA) methods through the structured sparsity. We then propose a new framework by combining the convolutive narrowband approximation and the Windowed-Group-Lasso (WGL). The optimisation of the model is based on the alternating optimisation approach where the convolutive kernel and the source components are jointly optimised.

[1]  Tuomas Virtanen,et al.  Multichannel Blind Sound Source Separation Using Spatial Covariance Model With Level and Time Differences and Nonnegative Matrix Factorization , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  M. Kowalski Sparse regression using mixed norms , 2009 .

[3]  Hiroshi Sawada,et al.  Independent Low-Rank Matrix Analysis with Decorrelation Learning , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[4]  Masahito Togami,et al.  Independent vector analysis with frequency range division and prior switching , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[5]  Tuomas Virtanen,et al.  Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation , 2022 .

[6]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[9]  Israel Cohen,et al.  System Identification in the Short-Time Fourier Transform Domain With Crossband Filtering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Matthieu Kowalski,et al.  Hybrid model and structured sparsity for under-determined convolutive audio source separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Hiroshi Sawada,et al.  Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Nobutaka Ono,et al.  Stable and fast update rules for independent vector analysis based on auxiliary function technique , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[13]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[15]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[17]  Te-Won Lee,et al.  Independent Vector Analysis using Non-Spherical Joint Densities for the Separation of Speech Signals , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18]  Matthieu Kowalski,et al.  Underdetermined Reverberant Blind Source Separation: Sparse Approaches for Multiplicative and Convolutive Narrowband Approximation , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Kai Siedenburg,et al.  Social Sparsity! Neighborhood Systems Enrich Structured Shrinkage Operators , 2013, IEEE Transactions on Signal Processing.

[20]  Rémi Gribonval,et al.  Beyond the Narrowband Approximation: Wideband Convex Methods for Under-Determined Reverberant Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Takuya Yoshioka,et al.  Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Israel Cohen,et al.  Relative Transfer Function Identification Using Convolutive Transfer Function Approximation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  W. Kellermann,et al.  Wideband algorithms versus narrowband algorithms for adaptive filtering in the DFT domain , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[24]  Gil-Jin Jang,et al.  Independent vector analysis based on overlapped cliques of variable width for frequency-domain blind signal separation , 2012, EURASIP J. Adv. Signal Process..

[25]  Dennis R. Morgan,et al.  A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Daichi Kitamura,et al.  Time-frequency-masking-based Determined BSS with Application to Sparse IVA , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[28]  Radu Horaud,et al.  Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[31]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[32]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[33]  Hirokazu Kameoka,et al.  Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[34]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[35]  Andreas Ziehe,et al.  An approach to blind source separation based on temporal structure of speech signals , 2001, Neurocomputing.

[36]  Te-Won Lee,et al.  Blind Source Separation Exploiting Higher-Order Frequency Dependencies , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).