Phone classification using HMM/SVM system and normalization technique

Support vector machines (SVM) were originally developed for binary classification and extended for multi-class classification. Due to their powerfulness and adaptation to hard classification problems, we have chosen them for automatic speech recognition (ASR). The aim of this paper is to investigate the use of SVM multi-class classification coupled with HMM for TIMIT phones. SVM requires that all data samples for training and test to have the same features vector size. Due to the variability in length of phone signals even for the same phone, we have used a normalization technique: zero padding and resampling on all data samples to get them have features vector with the same size. After mapping the 61 TIMIT phones in 46 phones and conducting tests using LibSVM and HTK, we have obtained a classification accuracy rate of 91.26% with the hybrid HMM/SVM system and 71.41% with the HMM-based system. These results show that the hybrid HMM/SVM system using the normalization technique overcomes an HMM-based system and improves the recognition accuracy by 19.8%. Therefore, our experiments result encouraged us to use this hybrid system and normalization technique for the next work in the context of spoken dialogue system.

[1]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Chih-Jen Lin,et al.  A Comparison of Methods for Multi-class Support Vector Machines , 2015 .

[3]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[4]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[5]  Chin-Hui Lee,et al.  High-Accuracy Phone Recognition By Combining High-Performance Lattice Generation and Knowledge Based Rescoring , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Carmen Peláez-Moreno,et al.  SVMs for Automatic Speech Recognition: A Survey , 2005, WNSP.

[7]  Lawrence K. Saul,et al.  Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[9]  Jean-Luc Gauvain,et al.  High performance speaker-independent phone recognition using CDHMM , 1993, EUROSPEECH.

[10]  T. W. Parks,et al.  Digital Filter Design , 1987 .

[11]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[14]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.