Urban noise recognition with convolutional neural network

Urban noise recognition play a vital role in city management and safety operation, especially in the recent smart city engineering. Exiting studies on urban noise recognition are mostly based on conventional acoustic features, such as Mel-Frequency Cepstral Coefficients (MFCC) and Linear Prediction Cepstral Coefficients (LPCC), and the shallow structure based classifiers, such as support vector machine (SVM). However, the urban acoustic environment is complicated and changeable. Conventional acoustic representation and recognition methods may be insufficient in characterizing urban noises, and generally suffer from a degraded performance. In this paper, we study the recent deep neural network based urban noise recognition. The log-Mel-spectrogram, namely, the FBank feature is first derived for acoustic representation. Then, the FBank spectrum constructed with a set of FBank feature vectors from multiple acoustic signal frames is fed to a convolutional neural network (CNN) for urban noise recognition. Comprehensive studies on the dimension of FBank spectrums and the parameters in CNN, including the size of learnable kernels, the dropout rate, and the activation function, etc., are presented in the paper. An acoustic database collected in real environment covering 11 most common urban noises with more than 56,000 samples is constructed for model verification and performance evaluation. In addition, the traditional LPCC and MFCC acoustic feature combining with two popular machine learning algorithms, extreme learning machine (ELM) and support vector machine (SVM), and the FBank image feature combining with extreme learning machine (ELM), hierarchical extreme learning machine (H-ELM) and multilayer extreme learning machine (ML-ELM), have also been presented for discussions. Experimental results show that the proposed method generally outperforms conventional shallow structure based classifiers.

[1]  Jianzhong Wang,et al.  Acoustics recognition of excavation equipment based on MF-PLPCC features and RELM , 2017, 2017 36th Chinese Control Conference (CCC).

[2]  Stavros Ntalampiras,et al.  Universal background modeling for acoustic surveillance of urban traffic , 2014, Digit. Signal Process..

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Yudong Zhang,et al.  Smart pathological brain detection by synthetic minority oversampling technique, extreme learning machine, and Jaya algorithm , 2017, Multimedia Tools and Applications.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  P. Bahr,et al.  Sampling: Theory and Applications , 2020, Applied and Numerical Harmonic Analysis.

[7]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Paulo Henrique Trombetta Zannin,et al.  A SURVEY OF URBAN NOISE ANNOYANCE IN A LARGE BRAZILIAN CITY: THE IMPORTANCE OF A SUBJECTIVE ANALYSIS IN CONJUNCTION WITH AN OBJECTIVE ANALYSIS , 2003 .

[9]  Jagannath H. Nirmal,et al.  A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network , 2015, 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR).

[10]  Paulo Henrique Trombetta Zannin,et al.  THE STATISTICAL MODELING OF ROAD TRAFFIC NOISE IN AN URBAN SETTING , 2003 .

[11]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[12]  Shu Tao,et al.  EVALUATION AND ANALYSIS OF TRAFFIC NOISE FROM THE MAIN URBAN ROADS IN BEIJING , 2002 .

[13]  Jianzhong Wang,et al.  Excavation Equipment Recognition Based on Novel Acoustic Statistical Features , 2017, IEEE Transactions on Cybernetics.

[14]  Jae-Hun Kim,et al.  Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Luis Pastor Sanchez Fernandez,et al.  Methods of analysis for urban environmental noise , 2015, 2015 SAI Intelligent Systems Conference (IntelliSys).

[16]  J. M. B. Morillas,et al.  An environmental noise study in the city of Cáceres, Spain , 2002 .

[17]  Diego P. Ruiz,et al.  Automated classification of urban locations for environmental noise impact assessment on the basis of road-traffic content , 2016, Expert Syst. Appl..

[18]  Brenda McCabe,et al.  Part based model and spatial–temporal reasoning to recognize hydraulic excavators in construction images and videos , 2012 .

[19]  Dinesh Kumar,et al.  Environmental sound sources classification using neural networks , 2001, The Seventh Australian and New Zealand Intelligent Information Systems Conference, 2001.

[20]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[21]  Anjali Goyal,et al.  Improved universal quantitative steganalysis in spatial domain using ELM ensemble , 2018, Multimedia Tools and Applications.

[22]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[23]  Baoxiang Huang,et al.  A virtual perception method for urban noise: The calculation of noise annoyance threshold and facial emotion expression in the virtual noise scene , 2015 .

[24]  Woon-Seng Gan,et al.  Noisy vehicle surveillance camera: A system to deter noisy vehicle in smart city , 2017 .

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Justin Salamon,et al.  The Implementation of Low-cost Urban Acoustic Monitoring Devices , 2016, ArXiv.

[27]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[28]  Kai Zhang,et al.  Extreme learning machine and adaptive sparse representation for image classification , 2016, Neural Networks.

[29]  Karol J. Piczak Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[30]  Jianzhong Wang,et al.  Excavation equipment classification based on improved MFCC features and ELM , 2017, Neurocomputing.

[31]  Manfred R. Schroeder,et al.  Linear predictive coding of speech: Review and current directions , 1985, IEEE Communications Magazine.

[32]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Léa Cristina Lucas de Souza,et al.  Urban indices as environmental noise indicators , 2011, Comput. Environ. Urban Syst..

[35]  Ying Chen,et al.  A novel traffic-noise prediction method for non-straight roads , 2012 .

[36]  Yanmin Qian,et al.  Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[37]  Badong Chen,et al.  Density-Dependent Quantized Least Squares Support Vector Machine for Large Data Sets , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Takumi Kobayashi,et al.  Urban sound event classification based on local and global features aggregation , 2017 .

[39]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  C. Vong,et al.  A novel distance estimation algorithm for periodic surface vibrations based on frequency band energy percentage feature , 2017, Mechanical Systems and Signal Processing.

[41]  Min-Der Lin,et al.  Noise mapping in urban environments: A Taiwan study , 2009 .

[42]  Yifan Gong,et al.  A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models , 2014, INTERSPEECH.

[43]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[44]  Tao Chen,et al.  Landmark recognition with compact BoW histogram and ensemble ELM , 2015, Multimedia Tools and Applications.

[45]  Tuo Zhao,et al.  An enhance excavation equipments classification algorithm based on acoustic spectrum dynamic feature , 2017, Multidimens. Syst. Signal Process..

[46]  E. Salomons,et al.  Urban traffic noise and the relation to urban density, form, and traffic elasticity , 2012 .

[47]  Tara N. Sainath,et al.  Deep Convolutional Neural Networks for Large-scale Speech Tasks , 2015, Neural Networks.

[48]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[49]  Jianzhong Wang,et al.  Linear prediction of one-sided autocorrelation sequence for noisy acoustics recognition of excavation equipments , 2016, 2016 12th World Congress on Intelligent Control and Automation (WCICA).

[50]  C. Asensio Acoustics in Smart Cities , 2017 .