Efficient implementation of an SVM-based speech/music classifier by enhancing temporal locality in support vector references

Speech/music classification is an integral part of various consumer electronics applications such as audio codecs, multimedia document indexing, and automatic speech recognition. To achieve high performance at speech/music classification, a support vector machine (SVM) has been widely used as a classifier due to its decent classification capability. However, in order to use an SVM-based speech/music classifier in embedded systems, which gradually replace desktop computer systems, one significant implementation problem needs to be resolved: high implementation cost due to time and energy inefficiency. The memory requirement determined by the dimensionality and the number of support vectors, is generally too high for an embedded systems cache to accommodate resulting in expensive memory accesses. In this paper, two techniques are proposed to reduce expensive memory accesses by enhancing temporal locality in support vector references utilizing fetched data from memory with great efficiency. For this, the patterns in support vector references are first analyzed, and then loop transformation techniques are proposed to improve the temporal locality that register file and cache hierarchy take advantage of. The proposed techniques are evaluated by applying them to a speech codec, and the enhancement is confirmed by measuring the number of memory accesses, overall execution time, and energy consumption.

[1]  Joon-Hyuk Chang,et al.  Speech/Music Classification Enhancement for 3GPP2 SMV Codec Based on Support Vector Machine , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[2]  Yang Gao,et al.  The SMV algorithm selected by TIA and 3GPP2 for CDMA applications , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Sung-Jea Ko,et al.  Person identification system for future digital tv with intelligence , 2007, IEEE Transactions on Consumer Electronics.

[4]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[5]  Oscal T.-C. Chen,et al.  Low-Complexity Inverse Transforms of Video Codecs in an Embedded Programmable Platform , 2011, IEEE Transactions on Multimedia.

[6]  Julien Pinquier,et al.  Speech and music classification in audio documents , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[8]  Wasfi G. Al-Khatib,et al.  Machine-learning based classification of speech and music , 2006, Multimedia Systems.

[9]  Rhee Man Kil,et al.  Automatic media data rating based on class probability output networks , 2010, IEEE Transactions on Consumer Electronics.

[10]  Hiroshi Hamada,et al.  Video Handling with Music and Speech Detection , 1998, IEEE Multim..

[11]  Sergios Theodoridis,et al.  A Speech/Music Discriminator of Radio Recordings Based on Dynamic Programming and Bayesian Networks , 2008, IEEE Transactions on Multimedia.

[12]  W. Bastiaan Kleijn,et al.  Feature Selection Under a Complexity Constraint , 2009, IEEE Transactions on Multimedia.

[13]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[15]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[16]  Lei Chen,et al.  Mixed Type Audio Classification with Support Vector Machine , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[17]  S.M. Ahadi,et al.  Unsupervised speech/music classification using one-class support vector machines , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[18]  Jonathan Foote,et al.  An overview of audio information retrieval , 1999, Multimedia Systems.

[19]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[20]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[21]  Yong Luo,et al.  Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news , 2011, Multimedia Systems.

[22]  Hyeran Byun,et al.  A new face authentication system for memory-constrained devices , 2003, IEEE Trans. Consumer Electron..