Adaptive Audio Classification for Smartphone in Noisy Car Environment

With ever-increasing number of car-mounted electronic devices that are accessed, managed, and controlled with smartphones, car apps are becoming an important part of the automotive industry. Audio classification is one of the key components of car apps as a front-end technology to enable human-app interactions. Existing approaches for audio classification, however, fall short as the unique and time-varying audio characteristics of car environments are not appropriately taken into account. Leveraging recent advances in mobile sensing technology that allow for effective and accurate driving environment detection, in this paper, we develop an audio classification framework for mobile apps that categorizes an audio stream into music, speech, speech+music, and noise, adaptably depending on different driving environments. A case study is performed with four different driving environments, i.e., highway, local road, crowded city, and stopped vehicle. More than 420 minutes of audio data are collected including various genres of music, speech, speech+music, and noise from the driving environments. The results demonstrate that the proposed approach improves the average classification accuracy by up to 166%, and 64% for speech, and speech+music, respectively, compared with a non-adaptive approach in our experimental settings.

[1]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[2]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[3]  John H. L. Hansen,et al.  Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems , 2003, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Susanto Rahardja,et al.  Detecting Musical Sounds in Broadcast Audio Based on Pitch Tuning Analysis , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[5]  H Jones,et al.  Roadroid continuous road condition monitoring with smart phones , 2014 .

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Florian Eyben,et al.  Real-time Speech and Music Classification by Large Audio Feature Space Extraction , 2015 .

[9]  G. Seth Psychology of Language , 1968, Nature.

[10]  Muhammad Haroon Yousaf,et al.  Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods , 2015 .

[11]  Hyon-Soo Lee,et al.  Speech/Music Discrimination using Spectral Peak Feature for Speaker Indexing , 2006, 2006 International Symposium on Intelligent Signal Processing and Communications.

[12]  Lei Xie,et al.  Noise robust features for speech/music discrimination in real-time telecommunication , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[13]  John H. L. Hansen,et al.  A new perspective on feature extraction for robust in-vehicle speech recognition , 2003, INTERSPEECH.

[14]  Lie Lu,et al.  Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems , 2003 .

[15]  Seong-Ro Lee,et al.  Efficient implementation of an SVM-based speech/music classifier by enhancing temporal locality in support vector references , 2012, IEEE Transactions on Consumer Electronics.

[16]  Mingli Song,et al.  Deep neural network derived bottleneck features for accurate audio classification , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[17]  P. Dhanalakshmi,et al.  Speech/Music Classification using wavelet based Feature Extraction Techniques , 2014, J. Comput. Sci..

[18]  M. Mason,et al.  FPGA implementation of spectral subtraction for in-car speech enhancement and recognition , 2008, 2008 2nd International Conference on Signal Processing and Communication Systems.

[19]  VirtanenTuomas,et al.  Detection and Classification of Acoustic Scenes and Events , 2018 .

[20]  Bhiksha Raj,et al.  Audio Event Detection using Weakly Labeled Data , 2016, ACM Multimedia.

[21]  Jun Wang,et al.  Real-time speech/music classification with a hierarchical oblique decision tree , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Wen Gao,et al.  A fast and robust speech/music discrimination approach , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[23]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[24]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[25]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[26]  L. Cuadra,et al.  NN-based automatic sound classifier for digital hearing aids , 2007, 2007 IEEE International Symposium on Intelligent Signal Processing.

[27]  Yong Luo,et al.  Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news , 2011, Multimedia Systems.

[28]  Thierry Derrmann,et al.  Driver Behavior Profiling Using Smartphones: A Low-Cost Platform for Driver Monitoring , 2015, IEEE Intelligent Transportation Systems Magazine.

[29]  Hanseok Ko,et al.  Background noise reduction via dual-channel scheme for speech recognition in vehicular environment , 2005, 2005 Digest of Technical Papers. International Conference on Consumer Electronics, 2005. ICCE..

[30]  Ming Liu,et al.  AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.

[31]  Constantine Kotropoulos,et al.  Music Genre Classification via Joint Sparse Low-Rank Representation of Audio Features , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[32]  Jyh-Shing Roger Jang,et al.  Combining Acoustic and Multilevel Visual Features for Music Genre Classification , 2015, TOMM.

[33]  John H. L. Hansen,et al.  "CU-move" : analysis & corpus development for interactive in-vehicle speech systems , 2001, INTERSPEECH.

[34]  Chuan Liu,et al.  Classification of Music and Speech in Mandarin News Broadcasts , 2007 .

[35]  Adnan Yazici,et al.  Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features , 2009, 2009 First International Conference on Advances in Multimedia.

[36]  Sang Hyuk Son,et al.  Enabling energy-efficient driving route detection using built-in smartphone barometer sensor , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[37]  Otilia Kocsis,et al.  Dynamic selection of a speech enhancement method for robust speech recognition in moving motorcycle environment , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Tristan Kleinschmidt,et al.  Robust speech recognition using speech enhancement , 2010 .

[39]  Lei Chen,et al.  Mixed Type Audio Classification with Support Vector Machine , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[40]  Petri Toiviainen,et al.  A Matlab Toolbox for Music Information Retrieval , 2007, GfKl.

[41]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[42]  Jyh-Shing Roger Jang,et al.  Combining Visual and Acoustic Features for Music Genre Classification , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[43]  John H. L. Hansen,et al.  An efficient microphone array based voice activity detector for driver's speech in noise and music rich in-vehicle environments , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Xudong Jiang,et al.  Sound-Event Classification Using Robust Texture Features for Robot Hearing , 2017, IEEE Transactions on Multimedia.

[45]  David W. Carroll,et al.  Psychology of Language , 1993 .

[46]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[47]  Gregory Sell,et al.  Music tonality features for speech/music discrimination , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).