论文信息 - Speech event detection using SVM and NMD

Speech event detection using SVM and NMD

In this paper we propose a speech event detector that segments speech signals in terms of four broad acoustic-phonetic classes of events. Frame-based detection was carried out using support vector machines (SVM). Non-negative matrix deconvolution (NMD) was used in order to switch from a frame-based detection to a segment-based detection. Results obtained using the TIMIT corpus are reported and compared to a broad class detector based on hidden Markov models (HMM) with a MFCC front-end. It was found that the proposed SVM/NMD system outperforms the HMM system in what concerns to accuracy and also to the quality of he detected boundaries.

Fernando Perdigão | Carla Lopes

[1] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[2] Mark Hasegawa-Johnson,et al. Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[4] Katharina Morik,et al. Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[5] A. Juneja,et al. Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[6] Steve Young,et al. The HTK book , 1995 .

[7] Fernando Perdigão,et al. Improved performance evaluation of speech event detectors , 2006, INTERSPEECH.

[8] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[9] Paris Smaragdis,et al. Mitsubishi Electric Research Laboratories , 1994 .

[10] James R. Glass,et al. Robust detection of sonorant landmarks , 2005, INTERSPEECH.

[11] Jitendra Malik,et al. Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[12] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[14] Jan Van der Spiegel,et al. An acoustic-phonetic feature-based system for automatic phoneme recognition in continuous speech , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[15] Jinyu Li,et al. On designing and evaluating speech event detectors , 2005, INTERSPEECH.

[16] R. Stephenson. A and V , 1962, The British journal of ophthalmology.

[17] A. J. Jex-Blake. “Et. all” , 1952 .