Signal representation comparison for phonetic classification

Two issues related to phonetic classification are addressed: first, whether there are any advantages in extracting acoustic attributes over directly using the spectral information for classification, and, second, whether it is advantageous to introduce an intermediate set of linguistic units, i.e., distinctive features, for phonetic classification. The authors focused on 13 monophthong vowels in American English, and investigated classification performance using an artificial neural net classifier with nearly 20000 vowel tokens from 550 speakers excised from the TIMIT corpus. The results indicate that acoustic attributes give performance similar to raw spectral information, but at potentially considerable computational savings. In addition, the distinctive feature representation gives similar performance to direct vowel classification, but potentially offers a more flexible mechanism for describing context dependency.<<ETX>>