Exploiting transitions and focussing on linguistic properties for ASR

This paper describes three cross-language ASR experiments which use hidden Markov mode lling. The first one shows that consonant identification improves when vowel transitions are used. In particular, the consonants’ place of articulation is identified better, because the vowel transitions contain formant trajectories which depend on the consonant’s place of articulation. The second experiment compares consonant identification results when acoustic parameters belonging to the consonant itself (no vowel transitions are used in the second experiment) are used as input to hidden Markov modelling directly with identification rates when acousticphonetic mapping is performed before applying hidden Markov modelling. It is shown that acoustic-phonetic mapping greatly improves consonant identification rates. In the third experiment, the acoustic parameters from the vowel transitions are also mapped onto consonantal ( not vocalic) features, as are the acoustic parameters belonging to the consonants. The additional use of vowel transitions does not lead to further improvements in the consonant identification, however. This is probably due to undertraining of the vowel transitions in the Kohonen network.

[1]  J. Harrington,et al.  The Place of Articulation Distinction in Voiced Oral Stops: Evidence from Burst Spectra and Formant Transitions , 1995 .

[2]  A. Liberman,et al.  The role of consonant-vowel transitions in the perception of the stop and nasal consonants. , 1954 .

[3]  S. Blumstein,et al.  Invariant cues for place of articulation in stop consonants. , 1978, The Journal of the Acoustical Society of America.

[4]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[5]  Paul Dalsgaard Phoneme label alignment using acoustic-phonetic features and Gaussian probability density functions , 1992 .

[6]  Jacques C. Koreman,et al.  RELATIONAL PHONETIC FEATURES FOR CONSONANT IDENTIFICATION IN A HYBRID ASR SYSTEM , 1997 .

[7]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Carol Y. Espy-Wilson,et al.  Speech parameterization based on phonetic features: application to speech recognition , 1995, EUROSPEECH.

[9]  M. Lennig,et al.  Modeling acoustic-phonetic detail in an HMM-based large vocabulary speech recognizer , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[10]  Jacques C. Koreman,et al.  Do phonetic features help to improve consonant identification in ASR? , 1998, ICSLP.

[11]  Kay-Fu Lee,et al.  Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[12]  P. Delattre,et al.  From Acoustic Cues to Distinctive Features , 1968 .

[13]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[14]  Anne-Marie Derouault,et al.  Context-dependent phonetic Markov models for large vocabulary speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.