A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition.

A probabilistic framework for a landmark-based approach to speech recognition is presented for obtaining multiple landmark sequences in continuous speech. The landmark detection module uses as input acoustic parameters (APs) that capture the acoustic correlates of some of the manner-based phonetic features. The landmarks include stop bursts, vowel onsets, syllabic peaks and dips, fricative onsets and offsets, and sonorant consonant onsets and offsets. Binary classifiers of the manner phonetic features-syllabic, sonorant and continuant-are used for probabilistic detection of these landmarks. The probabilistic framework exploits two properties of the acoustic cues of phonetic features-(1) sufficiency of acoustic cues of a phonetic feature for a probabilistic decision on that feature and (2) invariance of the acoustic cues of a phonetic feature with respect to other phonetic features. Probabilistic landmark sequences are constrained using manner class pronunciation models for isolated word recognition with known vocabulary. The performance of the system is compared with (1) the same probabilistic system but with mel-frequency cepstral coefficients (MFCCs), (2) a hidden Markov model (HMM) based system using APs and (3) a HMM based system using MFCCs.

[1]  Raymond Y. T. Chun,et al.  A hierarchical feature representation for phonetic classification , 1996 .

[2]  Jan Van der Spiegel,et al.  Auditory-based acoustic-phonetic signal processing for robust continuous speech recognition , 1999 .

[3]  Carol Y. Espy-Wilson,et al.  Automatic Classification of Nasals and Semivowels , 2003 .

[4]  Ariel Salomon,et al.  Use of temporal information: detection of periodicity, aperiodicity, and pitch in speech , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Carol Y. Espy-Wilson,et al.  SIGNIFICANCE OF INVARIANT ACOUSTIC CUES IN A PROBABILISTIC FRAMEWORK FOR LANDMARK-BASED SPEECH RECOGNITION , 2004 .

[6]  Tarun Pruthi Analysis, vocal-tract modeling and automatic detection of vowel nasalization , 2007 .

[7]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[8]  Ariel Salomon,et al.  Robust speech event detection using strictly temporal information , 2001 .

[9]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[10]  Sharlene A. Liu,et al.  Landmark detection for distinctive feature-based speech recognition , 1996 .

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Ellen Eide,et al.  A linguistic feature representation of the speech waveform , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Lokendra Shastri,et al.  A syllable, articulatory-feature, and stress-accent model of speech recognition , 2002 .

[14]  Ben-Zion Bobrovsky,et al.  Plosive spotting with margin classifiers , 2001, INTERSPEECH.

[15]  Steven C. Lee Probabilistic segmentation for segment-based speech recognition , 1998 .

[16]  Andrew K. Halberstadt Heterogeneous acoustic measurements and multiple classifiers for speech recognition , 1999 .

[17]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[18]  Carol Y. Espy-Wilson,et al.  Acoustic analysis and modeling of speech based on phonetic features , 1998 .

[19]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[20]  George N. Clements,et al.  The geometry of phonological features , 1985, Phonology Yearbook.

[21]  Katrin Kirchhoff,et al.  Robust speech recognition using articulatory information , 1998 .

[22]  Carol Y. Espy-Wilson,et al.  Acoustic-phonetic speech parameters for speaker-independent speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  A. Juneja,et al.  Speech segmentation using probabilistic phonetic feature hierarchy and support vector machines , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[24]  Partha Niyogi,et al.  Distinctive feature detection using support vector machines , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[25]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[26]  Kenneth N. Stevens,et al.  Automatic syllable detection for vowel landmarks , 2000 .

[27]  Ariel Salomon,et al.  Detection of speech landmarks: use of temporal information. , 2004, The Journal of the Acoustical Society of America.

[28]  A. Juneja,et al.  Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[29]  Ronald A. Cole,et al.  Automatic time alignment of phonemes using acoustic-phonetic information , 2000 .

[30]  Victor Zue,et al.  The MIT SUMMIT Speech Recognition System: A Progress Report , 1989, HLT.

[31]  Mark Hasegawa-Johnson,et al.  Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[32]  Kenneth N Stevens,et al.  Toward a model for lexical access based on acoustic landmarks and distinctive features. , 2002, The Journal of the Acoustical Society of America.

[33]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[34]  Noëlle Carbonell,et al.  Aphodex, an acoustic-phonetic Decoding Expert System , 1987, Int. J. Pattern Recognit. Artif. Intell..

[35]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[36]  Carol Y. Espy-Wilson,et al.  Knowledge-based parameters for HMM speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[37]  Kun Xia,et al.  A new strategy of formant tracking based on dynamic programming , 2000, INTERSPEECH.

[38]  George H. Freeman,et al.  An HMM‐based speech recognizer using overlapping articulatory features , 1996 .

[39]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[40]  Shigeki Sagayama,et al.  Support vector machine with dynamic time-alignment kernel for speech recognition , 2001, INTERSPEECH.

[41]  Carol Y. Espy-Wilson,et al.  An acoustic-phonetic approach to speech recognition : application to the semivowels , 1987 .

[42]  Carol Y. Espy-Wilson,et al.  Speech recognition based on phonetic features and acoustic landmarks , 2004 .

[43]  Mark A. Hasegawa-Johnson,et al.  Formant and burst spectral measurements with quantitative error models for speech sound classification , 1996 .