A source separation evaluation method in object-based spatial audio

Representing a complex acoustic scene with audio objects is desirable but challenging in object-based spatial audio production and reproduction, especially when concurrent sound signals are present in the scene. Source separation (SS) provides a potentially useful and enabling tool for audio object extraction. These extracted objects are often remixed to reconstruct a sound field in the reproduction stage. A suitable SS method is expected to produce audio objects that ultimately deliver high quality audio after remix. The performance of these SS algorithms therefore needs to be evaluated in this context. Existing metrics for SS performance evaluation, however, do not take into account the essential sound field reconstruction process. To address this problem, here we propose a new SS evaluation method which employs a remixing strategy similar to the panning law, and provides a framework to incorporate the conventional SS metrics. We have tested our proposed method on real-room recordings processed with four SS methods, including two state-of-the-art blind source separation (BSS) methods and two classic beamforming algorithms. The evaluation results based on three conventional SS metrics are analysed.

[1]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[2]  A. Berkhout,et al.  Acoustic control by wave field synthesis , 1993 .

[3]  Philip J. B. Jackson,et al.  Audio Object Separation Using Microphone Array Beamforming , 2015 .

[4]  Jan Plogsties,et al.  MPEG-H Audio—The New Standard for Universal Spatial / 3D Audio Coding , 2014 .

[5]  E. Oja,et al.  Independent Component Analysis , 2013 .

[6]  Jerome Daniel,et al.  Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format , 2003 .

[7]  Emmanuel Vincent,et al.  First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results , 2007, ICA.

[8]  Alan D. Blumlein,et al.  British Patent Specification 394,325 (Improvements in and relating to Sound-transmission, Sound-recording and Sound-reproducing Systems) , 1958 .

[9]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  James M. Kates,et al.  The Hearing-Aid Speech Quality Index (HASQI) , 2010 .

[11]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[13]  Philip J. B. Jackson,et al.  Estimation of Room Reflection Parameters for a Reverberant Spatial Audio Object , 2015 .

[14]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[15]  Atiyeh Alinaghi,et al.  Joint Mixing Vector and Binaural Model Based Stereo Source Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[17]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[18]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[19]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[21]  Jian Li,et al.  On robust Capon beamforming and diagonal loading , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..