SepNet: A Deep Separation Matrix Prediction Network for Multichannel Audio Source Separation

In this paper, we propose SepNet, a deep neural network (DNN) designed to predict separation matrices from multichannel observations. One well-known approach to blind source separation (BSS) involves independent component analysis (ICA). A recently developed method called independent low-rank matrix analysis (ILRMA) is one of its powerful variants. These methods allow the estimation of separation matrices based on deterministic iterative algorithms. Specifically, ILRMA is designed to update the separation matrix according to an update rule derived based on the majorization-minimization principle. Although ILRMA performs reasonably well under some conditions, there is still room for improvement in terms of both separation accuracy and computation time, especially for large-scale microphone arrays. The existence of a deterministic iterative algorithm that can find one of the stationary points of the BSS problem implies that a DNN can also play that role if designed and trained properly. Motivated by this, we propose introducing a DNN that learns to convert a predefined input (e.g., an identity matrix) into a true separation matrix in accordance with a multichannel observation. To enable it to find one of the multiple solutions corresponding to different permutations of the source indices, we further propose adopting a permutation invariant training strategy to train the network. By using a fully convolutional architecture, we can design the network so that the forward propagation can be computed efficiently. The experimental results revealed that SepNet was able to find separation matrices faster and with better separation accuracy than ILRMA for mixtures of two sources.

[1]  Nobutaka Ono,et al.  Stable and fast update rules for independent vector analysis based on auxiliary function technique , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[2]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[3]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Shinnosuke Takamichi,et al.  Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[5]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Hirokazu Kameoka,et al.  Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Radu Horaud,et al.  Semi-supervised Multichannel Speech Enhancement with Variational Autoencoders and Non-negative Matrix Factorization , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Hirokazu Kameoka,et al.  Supervised Determined Source Separation with Multichannel Variational Autoencoder , 2019, Neural Computation.

[9]  Robin Scheibler,et al.  Fast and Stable Blind Source Separation with Rank-1 Updates , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Junichi Yamagishi,et al.  The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods , 2018, Odyssey.

[11]  H. Kameoka,et al.  Determined Blind Source Separation with Independent Low-Rank Matrix Analysis , 2018 .

[12]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[13]  Li Li,et al.  Fast MVAE: Joint Separation and Classification of Mixed Sources Based on Multichannel Variational Autoencoder with Auxiliary Classifier , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Tatsuya Kawahara,et al.  Bayesian Multichannel Speech Enhancement with a Deep Speech Prior , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[15]  Jesper Jensen,et al.  Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Emmanuel Vincent,et al.  Multichannel Audio Source Separation With Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[18]  Hirokazu Kameoka,et al.  Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation , 2010, LVA/ICA.

[19]  Rémi Gribonval,et al.  Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling , 2009, ICA.

[20]  J. Cardoso,et al.  Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[21]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Hirokazu Kameoka,et al.  Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Jonathan Le Roux,et al.  Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.

[24]  Li Li,et al.  Semi-blind source separation with multichannel variational autoencoder , 2018, ArXiv.

[25]  Atsuo Hiroe,et al.  Solution of Permutation Problem in Frequency Domain ICA, Using Multivariate Probability Density Functions , 2006, ICA.

[26]  Te-Won Lee,et al.  Independent Vector Analysis: An Extension of ICA to Multivariate Components , 2006, ICA.

[27]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .