Speech event detection using SVM and NMD

In this paper we propose a speech event detector that segments speech signals in terms of four broad acoustic-phonetic classes of events. Frame-based detection was carried out using support vector machines (SVM). Non-negative matrix deconvolution (NMD) was used in order to switch from a frame-based detection to a segment-based detection. Results obtained using the TIMIT corpus are reported and compared to a broad class detector based on hidden Markov models (HMM) with a MFCC front-end. It was found that the proposed SVM/NMD system outperforms the HMM system in what concerns to accuracy and also to the quality of he detected boundaries.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Mark Hasegawa-Johnson,et al.  Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[4]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[5]  A. Juneja,et al.  Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[6]  Steve Young,et al.  The HTK book , 1995 .

[7]  Fernando Perdigão,et al.  Improved performance evaluation of speech event detectors , 2006, INTERSPEECH.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  Paris Smaragdis,et al.  Mitsubishi Electric Research Laboratories , 1994 .

[10]  James R. Glass,et al.  Robust detection of sonorant landmarks , 2005, INTERSPEECH.

[11]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[14]  Jan Van der Spiegel,et al.  An acoustic-phonetic feature-based system for automatic phoneme recognition in continuous speech , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[15]  Jinyu Li,et al.  On designing and evaluating speech event detectors , 2005, INTERSPEECH.

[16]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[17]  A. J. Jex-Blake “Et. all” , 1952 .