Speech-overlapped acoustic event detection for automotive applications

We present two approaches on acoustic event detection for speech-enabled car applications: a generative GMM-UBM approach and a discriminative GMM-SVM supervector approach. The systems detect whether or not a certain acoustic event occurred while the built-in microphone of the car was active to record a spoken command, either before, while, or after the driver was speaking. These events can be music playing, phone ringing, a passenger different from the driver is talking, laughing, or coughing. The task is formally defined as a detection task along the lines of well established detection tasks such as speaker recognition or language recognition. Similarly, the evaluation procedure has been designed to resemble the respective official evaluation series performed by NIST (i.e. it was a blind ’one-shot’ evaluation on a separately provided dataset). The performance of the system was calculated in terms of detection miss and false alarm probabilities (C Mi ss = C FA =1 , and P Ta rget =0 .5). The performance of the superior GMMSVM system was 0.0345 for known test speakers and 0.1955 for novel test speakers. Frequency-filtered band energy coefficients (FFBE) outperformed MFCCS on that task. The results are promising and suggest further experiments on more data.

[1]  Gérard Chollet,et al.  Support Vector Gmms for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[2]  Douglas E. Sturim,et al.  Classification Methods for Speaker Recognition , 2007, Speaker Classification.

[3]  Pedro J. Moreno,et al.  A Generative Model Based Kernel for SVM Classification in Multimedia Applications , 2004 .

[4]  Lonce Wyse,et al.  Audio events classification using hierarchical structure , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[5]  Jhing-Fa Wang,et al.  Content-Based Audio Classification Using Support Vector Machines and Independent Component Analysis , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Andrey Temko,et al.  Comparison of Sequence Discriminant Support Vector Machines for Acoustic Event Classification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Ming Liu,et al.  HMM-Based Acoustic Event Detection with AdaBoost Feature Selection , 2007, CLEAR.

[8]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[9]  Jie Hao,et al.  Robust Technologies towards Automatic Speech Recognition in Car Noise Environments , 2006, 2006 8th international Conference on Signal Processing.

[10]  Andrey Temko,et al.  Classification of acoustic events using SVM-based clustering schemes , 2006, Pattern Recognit..

[11]  Alvin F. Martin Evaluations of Automatic Speaker Classification Systems , 2007, Speaker Classification.

[12]  Marina Meila,et al.  Data centering in feature space , 2003, AISTATS.

[13]  Liang Lu,et al.  Odyssey 2008: The Speaker and Language Recognition Workshop, Stellenbosch, South Africa, January 21-24, 2008 , 2008, Odyssey.

[14]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  A. Miguel,et al.  On-Line Feature and Acoustic Model Space Compensation for Robust Speech Recognition in Car Environment , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[16]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.