Discriminative training of GMM parameters for audio scene classification and audio tagging

This report describes the algorithm for audio scene classification and audio tagging and the result for DCASE 2016 challenge data. We propose a discriminative training algorithm to improve the baseline GMM performance. The algorithm updates the baseline GMM parameters by maximizing the margin between classes to improve discriminative performance. For Task1, we use a hierarchical classifier to maximize discriminative performance, and achieve 84% accuracy for given cross validation data. For Task4, we apply binary classifier for each label, and achieve 16.71% EER for given cross validation data.

[1]  Chang Dong Yoo,et al.  Loss-Scaled Large-Margin Gaussian Mixture Models for Speech Emotion Classification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[3]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Simon Dixon,et al.  Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model. , 2013, The Journal of the Acoustical Society of America.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[9]  Johan Schalkwyk,et al.  Learning acoustic frame labeling for speech recognition with recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[11]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..