Efficient Independent Vector Extraction of Dominant Target Speech

The complete decomposition performed by blind source separation is computationally demanding and superfluous when only the speech of one specific target speaker is desired. In this paper, we propose a computationally efficient blind speech extraction method based on a proper modification of the commonly utilized independent vector analysis algorithm, under the mild assumption that the average power of signal of interest outweighs interfering speech sources. Considering that the minimum distortion principle cannot be implemented since the full demixing matrix is not available, we also design a one-unit scaling operation to solve the scaling ambiguity. Simulations validate the efficacy of the proposed method in extracting the dominant speech.

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Hirokazu Kameoka,et al.  A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF , 2019, APSIPA Transactions on Signal and Information Processing.

[3]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[4]  Wei Liu,et al.  Blind Second-Order Source Extraction of Instantaneous Noisy Mixtures , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[5]  Robin Scheibler,et al.  MM Algorithms for Joint Independent Subspace Analysis with Application to Blind Single and Multi-Source Extraction , 2020, ArXiv.

[6]  Allan Kardec Barros,et al.  Extraction of Specific Signals with Temporal Structure , 2001, Neural Computation.

[7]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[8]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[9]  Yanfeng Liang,et al.  Enhanced independent vector analysis for audio separation in a room environment , 2013 .

[10]  Hirokazu Kameoka,et al.  Supervised Determined Source Separation with Multichannel Variational Autoencoder , 2019, Neural Computation.

[11]  Taesu Kim,et al.  Real-Time Independent Vector Analysis for Convolutive Blind Source Separation , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[12]  Nobutaka Ono,et al.  Fast Independent Vector Extraction by Iterative SINR Maximization , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Te-Won Lee,et al.  On the Assumption of Spherical Symmetry and Sparseness for the Frequency-Domain Speech Model , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Shun-ichi Amari,et al.  Adaptive blind signal processing-neural network approaches , 1998, Proc. IEEE.

[15]  Zbynek Koldovský,et al.  Orthogonally-Constrained Extraction of Independent Non-Gaussian Component from Non-Gaussian Background Without ICA , 2018, LVA/ICA.

[16]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[17]  Matthew Anderson,et al.  Independent vector analysis: Theory, algorithms, and applications , 2013 .

[18]  Shun-ichi Amari,et al.  Sequential blind signal extraction in order specified by stochastic properties , 1997 .

[19]  K. Matsuoka,et al.  Minimal distortion principle for blind source separation , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..

[20]  Te-Won Lee,et al.  Fast fixed-point independent vector analysis algorithms for convolutive blind source separation , 2007, Signal Process..

[21]  Allan Kardec Barros,et al.  Extraction of event-related signals from multichannel bioelectrical measurements , 2000, IEEE Trans. Biomed. Eng..

[22]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Allan Kardec Barros,et al.  Extraction of Signals With Specific Temporal Structure Using Kernel Methods , 2010, IEEE Transactions on Signal Processing.

[24]  Zhang Yi,et al.  Extraction of a source signal whose kurtosis value lies in a specific range , 2006, Neurocomputing.

[25]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Nima Mesgarani,et al.  Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Zhiwei Xiong,et al.  PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network , 2019, AAAI.

[28]  DeLiang Wang,et al.  Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Te-Won Lee,et al.  Blind Source Separation Exploiting Higher-Order Frequency Dependencies , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Hirokazu Kameoka,et al.  Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  Zhaoyi Gu,et al.  Speech Separation Using Independent Vector Analysis with an Amplitude Variable Gaussian Mixture Model , 2019, INTERSPEECH.

[32]  Sergio Cruces,et al.  From blind signal extraction to blind instantaneous signal separation: criteria, algorithms, and stability , 2004, IEEE Transactions on Neural Networks.

[33]  Andrzej Cichocki,et al.  Blind signal extraction of signals with specified frequency band , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[34]  Yehoshua Y. Zeevi,et al.  Extraction of a source from multichannel data using sparse decomposition , 2002, Neurocomputing.

[35]  Zbynek Koldovský,et al.  Gradient Algorithms for Complex Non-Gaussian Independent Component/Vector Extraction, Question of Convergence , 2018, IEEE Transactions on Signal Processing.