Feature Extraction of Binaural Recordings for Acoustic Scene Classification

Binaural technology becomes increasingly popular in the multimedia systems. This paper identifies a set of features of binaural recordings suitable for the automatic classification of the four basic spatial audio scenes representing the most typical patterns of audio content distribution around a listener. Moreover, it compares the five artificial-intelligence-based methods applied to the classification of binaural recordings. The results show that both the spatial and the spectro-temporal features are essential to accurate classification of binaurally rendered acoustic scenes. The spectro-temporal features appear to have a stronger influence on the classification results than the spatial metrics. According to the obtained results, the method based on the support vector machine, exploiting the features identified in the study, yields the classification accuracy approaching 84%.

[1]  Mark D. Plumbley,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[2]  Patrick Susini,et al.  The Timbre Toolbox: extracting audio descriptors from musical signals. , 2011, The Journal of the Acoustical Society of America.

[3]  Hans Wallach,et al.  The precedence effect in sound localization. , 1949, The American journal of psychology.

[4]  Shao-Hu Peng,et al.  Acoustic Scene Classification Using Deep Convolutional Neural Network and Multiple Spectrograms Fusion , 2017, DCASE.

[5]  K. Obermayer,et al.  Two!Ears – Integral interactive model of auditory perception and experience , 2014 .

[6]  VirtanenTuomas,et al.  Detection and Classification of Acoustic Scenes and Events , 2018 .

[7]  Jens Blauert,et al.  The Technology of Binaural Listening , 2013 .

[8]  Francis Rumsey,et al.  Spatial quality evaluation for reproduced sound: terminology, meaning and a scene-based paradigm , 2002 .

[9]  Torsten Dau,et al.  Computational speech segregation based on an auditory-inspired modulation analysis. , 2014, The Journal of the Acoustical Society of America.

[10]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[11]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[12]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[15]  Slawomir K. Zielinski,et al.  Feature Extraction of Surround Sound Recordings for Acoustic Scene Classification , 2018, ICAISC.

[16]  Mathieu Lagrange,et al.  Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Kyogu Lee,et al.  Convolutional Neural Networks with Binaural Representations and Background Subtraction for Acoustic Scene Classification , 2017, DCASE.

[18]  Daniel J. Tollin,et al.  The Precedence Effect in Sound Localization , 2015, Journal of the Association for Research in Otolaryngology.

[19]  Klaus Obermayer,et al.  Robust Detection of Environmental Sounds in Binaural Auditory Scenes , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.