Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain

There is growing interest in new audio formats in the context of virtual reality (VR), and higher-order ambisonics (HOA) is preferred for VR systems to transmit recorded scenes owing to its transmission efficiency and its flexibility to work with different loudspeaker setups. However, the conversion between another well-known format, i.e., object format, and the HOA format is not fully addressed in the literature. To address this issue, blind source separation in a spherical harmonic (SH) domain can be considered as the best way to extract objects in terms of efficiency, i.e., decoding HOA signals for separation can be omitted. A few authors attempted to extract objects from encoded HOA signals directly by using multichannel non-negative matrix factorization (MNMF), but these approaches either assume only far-field sources or do not take array characteristics into account, which make these methods difficult to use for VR in practical situations where singers or speakers often perform close to microphones. Furthermore, MNMF generally requires a huge computational cost, although dimensional reduction to the SH domain is performed. In this work, we also model near-field sources by estimating the model parameters of non-negative tensor factorization (NTF) in the SH domain assuming that microphone signals can be obtained with a rigid spherical array. We propose a masking scheme to exclude noisy evanescent regions in the SH domain from the NTF cost function. Evaluations show that our method outperforms existing methods devised for the HOA format and that our masking approach is effective in improving the separation quality.

[1]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[2]  Boaz Rafaely,et al.  Phase-mode versus delay-and-sum spherical microphone array processing , 2005, IEEE Signal Processing Letters.

[3]  Tuomas Virtanen,et al.  Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation , 2022 .

[4]  Volker Gnann SOURCE-FILTER BASED CLUSTERING FOR MONAURAL BLIND SOURCE SEPARATION , 2009 .

[5]  Robert Höldrich,et al.  3D binaural sound reproduction using a virtual ambisonic approach , 2003, IEEE International Symposium on Virtual Environments, Human-Computer Interfaces and Measurement Systems, 2003. VECIMS '03. 2003.

[6]  B. Rafaely Spatial Sampling and Beamforming for Spherical Microphone Arrays , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[7]  Tomohiro Nakatani,et al.  Complex angular central Gaussian mixture model for directional statistics in mask-based microphone array signal processing , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[8]  Yuki Mitsufuji,et al.  Analytic error control methods for efficient rotation in dynamic binaural rendering of Ambisonics. , 2020, The Journal of the Acoustical Society of America.

[9]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[10]  Archontis Politis,et al.  Multichannel NMF for Source Separation with Ambisonic Signals , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[11]  Hiroshi Saruwatari,et al.  Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Jan Plogsties,et al.  MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio , 2015, IEEE Journal of Selected Topics in Signal Processing.

[13]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[14]  Peter Jax,et al.  Advanced system options for binaural rendering of Ambisonic format , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Boaz Rafaely,et al.  Open-Sphere Designs for Spherical Microphone Arrays , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Emmanuel Vincent,et al.  Improved Perceptual Metrics for the Evaluation of Audio Source Separation , 2012, LVA/ICA.

[18]  Seokjin Lee,et al.  Beamspace-Domain Multichannel Nonnegative Matrix Factorization for Audio Source Separation , 2012, IEEE Signal Processing Letters.

[19]  Xin Guo,et al.  NMF-based blind source separation using a linear predictive coding error clustering criterion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Hirokazu Kameoka,et al.  Multichannel Signal Separation Combining Directional Clustering and Nonnegative Matrix Factorization with Spectrogram Restoration , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Sascha Spors,et al.  SPATIAL ENCODING AND DECODING OF FOCUSED VIRTUAL SOUND SOURCES , 2009 .

[22]  Hiroshi Saruwatari,et al.  Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Gary W. Elko,et al.  A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Hirokazu Kameoka,et al.  Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Rodney A. Kennedy,et al.  Intrinsic Limits of Dimensionality and Richness in Random Multipath Fields , 2007, IEEE Transactions on Signal Processing.

[26]  Boaz Rafaely,et al.  Analysis and design of spherical microphone arrays , 2005, IEEE Transactions on Speech and Audio Processing.

[27]  Nancy Bertin,et al.  Sound source separation in the higher order ambisonics domain , 2019 .

[28]  Thushara D. Abhayapala,et al.  Theory and design of high order sound field microphones using spherical microphone array , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[30]  Gary W. Elko,et al.  Spherical Microphone Arrays for 3D Sound Recording , 2004 .

[31]  Efthymios Tzinis,et al.  Unsupervised Deep Clustering for Source Separation: Direct Learning from Mixtures Using Spatial Information , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Heinz Teutsch,et al.  Modal Array Signal Processing: Principles and Applications of Acoustic Wavefield Decomposition , 2007 .

[33]  Thushara D. Abhayapala,et al.  Reproduction of a plane-wave sound field using an array of loudspeakers , 2001, IEEE Trans. Speech Audio Process..

[34]  Holography Book,et al.  Fourier Acoustics Sound Radiation And Nearfield Acoustical Holography , 2016 .

[35]  Stephanie Bertet,et al.  3D Sound Field Recording with Higher Order Ambisonics – Objective Measurements and Validation of a 4th order Spherical Microphone , 2006 .

[36]  Xabier Jaureguiberry,et al.  The Flexible Audio Source Separation Toolbox Version 2.0 , 2014, ICASSP 2014.

[37]  Shefeng Yan,et al.  Optimal Modal Beamforming for Spherical Microphone Arrays , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Deep Sen,et al.  Error analysis of spherical harmonic soundfield representations in terms of truncation and aliasing errors , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Craig Jin,et al.  Independent Component Analysis Using Spherical Microphone Arrays , 2012 .

[40]  Boaz Rafaely,et al.  Fundamentals of Spherical Array Processing , 2015, Springer Topics in Signal Processing.

[41]  Prasanga N. Samarasinghe,et al.  Wavefield Analysis Over Large Areas Using Distributed Higher Order Microphones , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[42]  Wei-Hsiang Liao,et al.  Microphone Array Geometry for Two Dimensional Broadband Sound Field Recording , 2018 .

[43]  Ramani Duraiswami,et al.  Flexible and Optimal Design of Spherical Microphone Arrays for Beamforming , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  Mathias Wien,et al.  Standardization Status of Immersive Video Coding , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.