Particle-filter tracking of sounds for frequency-independent 3D audio rendering from distributed B-format recordings

Six-Degree-of-Freedom (6DoF) audio rendering interactively synthesizes spatial audio signals for a variable listener perspective based on surround recordings taken at multiple perspectives distributed across the listening area in the acoustic scene. Methods that rely on recording-implicit directional information and interpolate the listener perspective without the attempt of localizing and extracting sounds often yield high audio quality, but are limited in spatial definition. Methods that perform sound localization, extraction, and rendering typically operate in the time-frequency domain and risk introducing artifacts such as musical noise. We propose to take advantage of the rich spatial information recorded in the broadband time-domain signals of the multitude of distributed first-order (B-format) recording perspectives. Broadband time-variant signal extraction retrieving direct signals and leaving residuals to approximate diffuse and spacious sounds is less of a quality risk, and likewise is the broadband re-encoding to enhance spatial definition of both signal types. To detect and track direct sound objects in this process, we combine the directional data recorded at the single perspectives into a volumetric multi-perspective activity map for particle-filter tracking. Our technical and perceptual evaluation confirms that this kind of processing enhances the otherwise limited spatial definition of direct-sound objects of other broadband but signal-independent virtual loudspeaker object (VLO) or Vector-Based Intensity Panning (VBIP) interpolation approaches.

[1]  Archontis Politis,et al.  First‐Order Directional Audio Coding (DirAC) , 2017 .

[2]  Michael Jeffet,et al.  Theory and Perceptual Evaluation of the Binaural Reproduction and Beamforming Tradeoff in the Generalized Spherical Array Beamformer , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Roger W. Johnson,et al.  An Introduction to the Bootstrap , 2001 .

[4]  Archontis Politis,et al.  Parametric Time-Frequency Domain Spatial Audio , 2017 .

[5]  Douglas G. Altman,et al.  Statistics with confidence: Confidence intervals and statistical guidelines . , 1990 .

[6]  Emanuel A. P. Habets,et al.  Geometry-Based Spatial Sound Acquisition Using Distributed Microphone Arrays , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Method for the subjective assessment of intermediate quality level of , 2014 .

[8]  Kyuwan Choi,et al.  Detecting the Number of Clusters in n-Way Probabilistic Clustering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Lachlan Birnie,et al.  Sound Field Translation Methods for Binaural Reproduction , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[10]  Khaled Boussetta,et al.  SoundDelta: A Study of Audio Augmented Reality Using WiFi-Distributed Ambisonic Cell Rendering , 2010 .

[11]  Edgar Y. Choueiri,et al.  Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones , 2016 .

[12]  Archontis Politis,et al.  Microphone array processing for parametric spatial audio techniques , 2016 .

[13]  Tomasz Zernicki,et al.  Toward Six Degrees of Freedom Audio Recording and Playback Using Multiple Ambisonics Sound Fields , 2019 .

[14]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[15]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[16]  Srdan Kitic,et al.  TRAMP: Tracking by a Real-time AMbisonic-based Particle filter , 2018, ArXiv.

[17]  K. Ruedenberg,et al.  Rotation Matrices for Real Spherical Harmonics. Direct Determination by Recursion , 1998 .

[18]  Enda Bates,et al.  A Recording Technique for 6 Degrees of Freedom VR , 2018 .

[19]  Natasha Barrett,et al.  A New Method for B-Format to Binaural Transcoding , 2010 .

[20]  Joseph G. Tylka,et al.  Domains of Practical Applicability for Parametric Interpolation Methods for Virtual Sound Field Navigation , 2019 .

[21]  Bastiaan Kleijn,et al.  Ambisonics soundfield navigation using directional decomposition and path distance estimation , 2017 .

[22]  G. Kitagawa Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models , 1996 .

[23]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[24]  A. Markushevich Analytic Function Theory , 1996 .

[25]  Klaus Ruedenberg,et al.  Rotation Matrices for Real Spherical Harmonics. Direct Determination by Recursion , 1996 .

[26]  Michael Jeffet,et al.  Study of a generalized spherical array beamformer with adjustable binaural reproduction , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[27]  Jun S. Liu,et al.  Blind Deconvolution via Sequential Imputations , 1995 .

[28]  Archontis Politis,et al.  Direction-of-arrival and diffuseness estimation above spatial aliasing for symmetrical directional microphone arrays , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Gavin Kearney,et al.  Practical Recording Techniques for Music Production with Six-Degrees of Freedom Virtual Reality , 2018 .

[30]  Xiguang Zheng,et al.  Soundfield navigation: Separation, compression and transmission , 2013 .

[31]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[32]  Jean Rouat,et al.  Robust 3D Localization and Tracking of Sound Sources Using Beamforming and Particle Filtering , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[33]  Christian Schörkhuber,et al.  Binaural Rendering of Ambisonic Signals via Magnitude Least Squares , 2022 .

[34]  Simon J. Godsill,et al.  Acoustic Source Localization and Tracking of a Time-Varying Number of Speakers , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Giovanni Del Galdo,et al.  Generating virtual microphone signals using geometrical information gathered by distributed arrays , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[36]  Boaz Rafaely,et al.  Generalized Spherical Array Beamforming for Binaural Speech Reproduction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[37]  Alessio Brutti,et al.  Multiple Source Localization Based on Acoustic Map De-Emphasis , 2010, EURASIP J. Audio Speech Music. Process..

[38]  Jean Rouat,et al.  Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering , 2007, Robotics Auton. Syst..

[39]  Hyunkook Lee,et al.  A New Multichannel Microphone Technique for Effective Perspective Control , 2011 .

[40]  S. Geneva,et al.  Sound Quality Assessment Material: Recordings for Subjective Tests , 1988 .

[41]  P. Fearnhead,et al.  Improved particle filter for nonlinear problems , 1999 .

[42]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[43]  Alessio Brutti,et al.  Localization of multiple speakers based on a two step acoustic map analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Ville Pulkki,et al.  Projecting Simulated or Recorded Spatial Sound onto 3D-Surfaces , 2012 .

[45]  Emanuel A. P. Habets,et al.  Six-Degrees-of-Freedom Binaural Audio Reproduction of First-Order Ambisonics with Distance Information , 2018 .

[46]  Ville Pulkki Directional Audio Coding in Spatial Sound Reproduction and Stereo Upmixing , 2006 .

[47]  Ville Pulkki,et al.  Synthesis of Complex Sound Scenes with Transformation of Recorded Spatial Sound in Virtual Reality , 2015 .

[48]  André van Schaik,et al.  Room acoustics simulation for multichannel microphone arrays , 2010 .

[49]  Joseph G. Tylka Virtual Navigation of Ambisonics-Encoded Sound Fields Containing Near-Field Sources , 2019 .

[50]  Peter Jax,et al.  Translation of a Higher Order Ambisonics Sound Scene Based on Parametric Decomposition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).