RELATIONAL PHONETIC FEATURES FOR CONSONANT IDENTIFICATION IN A HYBRID ASR SYSTEM

In this article we discuss implementation of some fundamental phonetic ideas related to what we shall call "relational processing" in a cross-language consonant identification system. The term relational processing refers to the way vowel transitions play a role in the identification of neighbouring consonants. Two experiments are described: first, consonant identification results from a hidden Markov modelling experiment are presented for consonants plus the preceding and following vowel transitions, if present. The results are compared to a baseline experiment, in which the vowel transitions are not used in the identification of the consonants. In the second experiment, the acoustic parameters are first mapped onto phonetic features; this mapping is performed by a Kohonen network 1 . Since vowel transitions are considered to be particularly important for identification of the place of articulation of the neighbouring consonant, only the place features (and not the consonants' manner features or the phonetic features of the vowel to which the transitions belong) are derived for the vowel transitions. Separate hidden Markov models are trained for the consonants, for the vowel offset and vowel onset transitions which share all consonantal place-of-articulation features. Concatenations of these models form the phone-like recognition units (comparable to the concatenation of phone models for the recognition of words in a conventional ASR system) which are later used for consonant identification. The results are compared with a baseline experiment in which no acoustic-phonetic mapping is performed. The experiments show that relational processing improves the consonant identification results.

[1]  Steven Greenberg,et al.  UNDERSTANDING SPEECH UNDERSTANDING: TOWARDS A UNIFIED THEORY OF SPEECH PERCEPTION , 1996 .

[2]  J. Harrington,et al.  The Place of Articulation Distinction in Voiced Oral Stops: Evidence from Burst Spectra and Formant Transitions , 1995 .

[3]  A. Liberman,et al.  The role of consonant-vowel transitions in the perception of the stop and nasal consonants. , 1954 .

[4]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[5]  Steve Young,et al.  The HTK book , 1995 .

[6]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[7]  Carol Y. Espy-Wilson,et al.  Speech parameterization based on phonetic features: application to speech recognition , 1995, EUROSPEECH.

[8]  Katrin Kirchhoff Phonologisch strukturierte HMMs zur automatischen Spracherkennung , 1996, KONVENS.

[9]  L Deng,et al.  Structural design of hidden Markov model speech recognizer using multivalued phonetic features: comparison with segmental speech units. , 1992, The Journal of the Acoustical Society of America.

[10]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  P. Delattre,et al.  From Acoustic Cues to Distinctive Features , 1968 .

[12]  Katrin Kirchhoff Syllable-level desynchronisation of phonetic features for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  M. Lennig,et al.  Modeling acoustic-phonetic detail in an HMM-based large vocabulary speech recognizer , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[14]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[15]  S. Blumstein,et al.  Invariant cues for place of articulation in stop consonants. , 1978, The Journal of the Acoustical Society of America.

[16]  Anne-Marie Derouault,et al.  Context-dependent phonetic Markov models for large vocabulary speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Li Deng,et al.  Speech recognition using the atomic speech units constructed from overlapping articulatory features , 1994, EUROSPEECH.

[18]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[19]  Paul Dalsgaard Phoneme label alignment using acoustic-phonetic features and Gaussian probability density functions , 1992 .

[20]  Jeffrey M. Zacks,et al.  A new neural network for articulatory speech recognition and its application to vowel identification , 1994, Comput. Speech Lang..

[21]  A. Liberman,et al.  Some Experiments on the Perception of Synthetic Speech Sounds , 1952 .