论文信息 - Deep neural network based environment sound classification and its implementation on hearing aid app

Deep neural network based environment sound classification and its implementation on hearing aid app

Abstract In general, a hearing aid app is very useful for the persons having either partial or complete inability to hear. At present, there is no special provision available in the hearing aid app for the classification of different environmental sounds. This paper proposes an algorithm for environmental sound classification based on Superimposed Audio Blocks using Deep Neural Networks (SAB - DNN) and also to implement it on the hearing aid app. The system can recognize five kinds of different sound fields automatically: bus, subway, street, indoor, car. In this system, 512 sampling points are taken as an audio frame and several audio frames are stacked up into an Audio Block (AB). when 7 audio frames are stacked up into an Audio Block (AB), the accuracy rate of sound environment classification using (AB - DNN) tends to be the best (96.18%). Based on this, the experiment integrates multiple Audio Block (AB) into an audio unit called Superimposed Audio Blocks(SAB) and classify it using DNN. Optimally, 30 sound blocks are integrated into a SAB which results in the classification accuracy up to 98.8%. As far as we know, it is the first time on the hearing aid app to implement an improved Deep Neural Network (DNN) based classification system and superposition of multi-audio frames and blocks.

[1] Arne Leijon,et al. An efficient robust sound classification algorithm for hearing aids. , 2004, The Journal of the Acoustical Society of America.

[2] Alain Rakotomamonjy,et al. Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.

[3] Teng Zhang,et al. Constrained Learned Feature Extraction for Acoustic Scene Classification , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4] Zhao Xue. Audio clip retrieval and relevance feedback based on the audio representation of fuzzy clustering , 2003 .

[5] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[6] Norbert Dillier,et al. Sound Classification in Hearing Aids Inspired by Auditory Scene Analysis , 2005, EURASIP J. Adv. Signal Process..

[7] Musaed Alhussein,et al. Automatic Scene Recognition through Acoustic Classification for Behavioral Robotics , 2019, Electronics.

[8] Dong Yu,et al. Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[9] Lu Jian,et al. Automatic Audio Classification by Using Hidden Markov Model , 2002 .

[10] Vesa T. Peltonen,et al. Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11] François Pachet,et al. The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[12] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[13] Michael Vorländer,et al. Sound Field Classification in Small Microphone Arrays Using Spatial Coherences , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[14] Mark D. Plumbley,et al. Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[15] Enrique Alexandre,et al. Feature Selection for Sound Classification in Hearing Aids Through Restricted Search Driven by Genetic Algorithms , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[16] Jia Lei,et al. Deep Learning: Yesterday, Today, and Tomorrow , 2013 .

[17] Yong Wang,et al. Research on image classification model based on deep convolution neural network , 2019, EURASIP Journal on Image and Video Processing.

[18] Ben P. Milner,et al. Acoustic environment classification , 2006, TSLP.

[19] Shrikanth Narayanan,et al. Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.