Multi-criteria subjective and objective evaluation of audio source separation

In this paper, we address the problem of assessing the perceived quality of estimated source signals in the context of audio source separation. These signals involve different kinds of distortions depending on the considered separation algorithm, including distortion of the target source, interference from other sources or musical noise artifacts. A new MUSHRA-based subjective test protocol is proposed to assess the perceived quality with respect to each kind of distortion and collect the scores of 20 subjects over 80 sounds. Subsequently, the contribution of each type of distortion to the overall quality is analyzed. We propose a family of objective measures aiming to predict the subjective scores based on a decomposition of the estimation error into several distortion components. We conclude by discussing possible implications of this work in the field of 3D audio quality assessment.

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Mark D. Plumbley,et al.  Probabilistic Modeling Paradigms for Audio Source Separation , 2010 .

[3]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[4]  Prasad Rajkishore,et al.  Fixed-Point ICA based Speech Signal Separation and Enhancement with Generalized Gaussian Model , 2005 .

[5]  Emmanuel Vincent,et al.  Preliminary guidelines for subjective evalutation of audio source separation algorithms , 2006 .

[6]  Emmanuel Vincent,et al.  First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results , 2007, ICA.

[7]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[8]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[9]  T. Herzke,et al.  Improved numerical methods for gammatone filterbank analysis and synthesis , 2007 .

[10]  Birger Kollmeier,et al.  PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  A. Kondoz,et al.  Comparison of subjective and objective evaluation methods for audio source separation , 2008 .

[12]  Hiroshi Sawada,et al.  Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Bryan Pardo,et al.  Modeling Perceptual Similarity of Audio Signals for Blind Source Separation Evaluation , 2007, ICA.

[14]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[15]  Stefan Winkler,et al.  Digital Video Quality: Vision Models and Metrics , 2005 .

[16]  Joby Joseph,et al.  Why only two ears? Some indicators from the study of source separation using two sensors , 2004 .