A study on the speaker‐independent feature extractinn of Japanese vowels by neural networks

The feature extraction characteristics of three‐layer back propagation neural networks, when applied to speaker‐independent vowel/gender recognition tasks, are investigated. Speech samples are monosyllabic vowels extracted from a syllablic data base of 100 Japanese syllables digitized at a sampling rate of 16 kHz spoken by 100 different speakers. Several LPC‐based parameters and physiology‐based parameters are tested for input representations. The results reveal that three hidden units are necessary to discriminate five Japanese vowels. Close investigation of hidden unit functions reveals that distributed representations of the targets are developed as hidden unit activation patterns. The frequency domain interpretation of the input weights of hidden units, using autocorrelation inputs, shows that the network extracts conventional knowledge on vowel formant structure. The relations between hidden unit activation patterns and descriptive features, as well as their implications to speech perception research...