ASR Based on the Analasys of the Short-MelFrequencyCepstra Time Transform

In this work, we propose to use as source of speech information the Short-MelfrequencyCepstra Time Transform (SMCTT), cτ(t). The SMCTT studies the time properties at quefrency τ. Since the SMCTT signal, cτ(t), comes from a nonlinear transformation of the speech signal, s(t), it makes the STMCTT a potential signal with new properties in time, frequency, quefrency, etc. The goal of this work is to present the performance of the SMCTT signal when the SMCTT is applied to an Automatic Speech Recognition (ASR) task. Our experiment results show that important information is given by this SMCTT waveform, cτ(t).

[1]  Chalapathy Neti,et al.  Multistage information fusion for audio-visual speech recognition , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  Liang Dong,et al.  Recognition of visual speech elements using adaptively boosted hidden Markov models , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Kai-Fu Lee,et al.  On large-vocabulary speaker-independent continuous speech recognition , 1988, Speech Commun..

[5]  S.M. Tahir,et al.  A comparison between speech signal representation using linear prediction and Gabor transform , 2003, 9th Asia-Pacific Conference on Communications (IEEE Cat. No.03EX732).

[6]  Mihaela Costin,et al.  Using neural networks and LPCC to improve speech recognition , 2003, Signals, Circuits and Systems, 2003. SCS 2003. International Symposium on.

[7]  Patrick Flandrin,et al.  Wigner-Ville spectral analysis of nonstationary processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[8]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[9]  R. M. Mersereau,et al.  Lip modeling for visual speech recognition , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[10]  Khaled H. Hamed,et al.  Time-frequency analysis , 2003 .

[11]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[12]  Jian Zhang,et al.  Analysis of lip geometric features for audio-visual speech recognition , 2004, IEEE Trans. Syst. Man Cybern. Part A.

[13]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[14]  Hynek Hermansky,et al.  Perceptually based linear predictive analysis of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  H. Garudadri Speech signal analysis using the Wigner distribution , 1989, Conference Proceeding IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[16]  Labib M. Khadra The smoothed pseudo Wigner distribution in speech processing , 1988 .