Statistical dialect classification based on mean phonetic features

Describes work done on a text-dependent method for automatic utterance classification and dialect model selection using mean cepstral and duration features on a per-phoneme basis. From transcribed dialect data, we build a linear discriminant to separate the dialects in feature space. This method is potentially much faster than our previous selection algorithm. We have been able to achieve error rates of 8% for distinguishing Northern US speakers from Southern US speakers, and average error rates of 13% on a variety of finer pairwise dialect discriminations. We also present a description of the training and test corpora collected for this work.