Do phonetic features help to improve consonant identification in ASR?

The hidden Markov mode lling experiments presented in this paper show that consonant identification results can be improved substantially if a neural network is used to extract linguistically relevant information from the acoustic signal before applying hidden Markov mode lling. The neural network – or in this case a combination of two Kohonen networks – takes 12 mel-frequency cepstral coefficients, overall energy and the corresponding delta parameters as input and outputs distinctive phonetic features, like [±uvular] and [ ±plosive]. Not only does this preprocessing of the data lead to better consonant identification rates, the confusions that occur between the consonants are less severe from a phonetic viewpoint, as is demonstrated. One reason for the improved consonant identification is that the acoustically variable consonant realisations can be mapped onto identical phonetic features by the neural network. This makes the input to hidden Markov mode lling more homogenous and improves consonant identification. Furthermore, by using phonetic features the neural network helps the system to focus on linguistically relevant information in the acoustic signal.