Dialect analysis and modeling for automatic classification

In this paper, we present our recent work in the analysis and modeling of speech under dialect. Dialect and accent significantly influence automatic speech recognition performance, and therefore it is critical to detect and classify non-native speech. In this study, we consider three areas that include: (i) prosodic structure (normalized f0, syllable rate, and sentence duration), (ii) phoneme acoustic space modeling and sub-word classification, and (iii) word-level based modeling using large vocabulary data. The corpora used in this study include: the NATO N-4 corpus (2 accents, 2 dialects of English), TIMIT (7 dialect regions), and American and British English versions of the WSJ corpus. These corpora were selected because the contained audio material from specific dialects/accents of English (N-4), were phonetically balanced and organized across U.S. (TIMIT), or contained significant amounts of read audio material from distinct dialects (WSJ). The results show that significant changes occur at the prosodic, phoneme space, and word levels for dialect analysis, and that effective dialect classification can be achieved using processing strategies from each domain.

[1]  John H. L. Hansen,et al.  Use of trajectory models for automatic accent classification , 2003, INTERSPEECH.

[2]  J. Hansen,et al.  A STUDY OF TEMPORAL FEATURES AND FREQUENCY CHARACTERISTICS IN AMERICAN ENGLISH FOREIGN ACCENT , 1997 .

[3]  John H. L. Hansen,et al.  Advances in unsupervised audio segmentation for the broadcast news and NGSW corpora , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  I. R. MacKay,et al.  Factors affecting strength of perceived foreign accent in a second language. , 1995, The Journal of the Acoustical Society of America.

[5]  Steven Greenberg,et al.  ON THE ORIGINS OF SPEECH INTELLIGIBILITY IN THE REAL WORLD , 1997 .

[6]  Tanja Schultz,et al.  Non-native spontaneous speech recognition through polyphone decision tree specialization , 2003, INTERSPEECH.

[7]  John H. L. Hansen,et al.  Unsupervised audio stream segmentation and clustering via the Bayesian information criterion , 2000, INTERSPEECH.

[8]  John J. Ohala,et al.  Prosody as a distinctive feature for the discrimination of arabic dialects , 1999, EUROSPEECH.

[9]  Jérôme Farinas,et al.  Automatic Modelling of Rhythm and Intonation for Language Identification , 2003 .

[10]  John H. L. Hansen,et al.  Language accent classification in American English , 1996, Speech Commun..

[11]  John H. L. Hansen,et al.  Stochastic trajectory model analysis for accent classification , 2002, INTERSPEECH.

[12]  Qin Yan,et al.  A comparative analysis of UK and US English accents in recognition and synthesis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Siegfried Kunzmann,et al.  Recent progress in the decoding of non-native speech with multilingual acoustic models , 2003, INTERSPEECH.

[14]  Aaron D. Lawson,et al.  Effect of foreign accent on speech recognition in the NATO n-4 corpus , 2003, INTERSPEECH.