Assessment of human and machine performance in acoustic scene classification: Dcase 2016 case study

Human and machine performance in acoustic scene classification is examined through a parallel experiment using TUT Acoustic Scenes 2016 dataset. The machine learning perspective is presented based on the systems submitted for the 2016 challenge on Detection and Classification of Acoustic Scenes and Events. The human performance, assessed through a listening experiment, was found to be significantly lower than machine performance. Test subjects exhibited different behavior throughout the experiment, leading to significant differences in performance between groups of subjects. An expert listener trained for the task obtained similar accuracy to the average of submitted systems, comparable also to previous studies of human abilities in recognizing everyday acoustic scenes.

[1]  Hanseok Ko,et al.  Score Fusion of Classification Systems for Acoustic Scene Classification , 2016 .

[2]  Jaehun Kim EMPIRICAL STUDY ON ENSEMBLE METHOD OF DEEP NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION , 2022 .

[3]  S. Squartini,et al.  DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks , 2016, DCASE.

[4]  Dan Stowell,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[5]  Fabien Ringeval,et al.  Pairwise Decomposition with Deep Neural Networks and Multiscale Kernel Subspace Learning for Acoustic Scene Classification , 2016, DCASE.

[6]  Alain Rakotomamonjy,et al.  Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.

[7]  P. Chandrasekhar,et al.  ACOUSTIC SCENE CLASSIFICATION USING DEEP NEURAL NETWORK , 2017 .

[8]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[10]  Ben P. Milner,et al.  Context awareness using environmental noise classification , 2003, INTERSPEECH.

[11]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[12]  Mark D. Plumbley,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[13]  Ankit Shah,et al.  DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[14]  S. Essid,et al.  SUPERVISED NONNEGATIVE MATRIX FACTORIZATION FOR ACOUSTIC SCENE CLASSIFICATION , 2016 .

[15]  Anssi Klapuri,et al.  Recognition of Everyday Auditory Scenes: Potentials, Latencies and Cues , 2001 .

[16]  Birger Kollmeier,et al.  On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’ , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[17]  Bhiksha Raj,et al.  Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording , 2016, DCASE.

[18]  Gerhard Widmer,et al.  CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS , 2016 .