Advances in phone-based modeling for automatic accent classification

It is suggested that algorithms capable of estimating and characterizing accent knowledge would provide valuable information in the development of more effective speech systems such as speech recognition, speaker identification, audio stream tagging in spoken document retrieval, channel monitoring, or voice conversion. Accent knowledge could be used for selection of alternative pronunciations in a lexicon, engage adaptation for acoustic modeling, or provide information for biasing a language model in large vocabulary speech recognition. In this paper, we propose a text-independent automatic accent classification system using phone-based models. Algorithm formulation begins with a series of experiments focused on capturing the spectral evolution information as potential accent sensitive cues. Alternative subspace representations using principal component analysis and linear discriminant analysis with projected trajectories are considered. Finally, an experimental study is performed to compare the spectral trajectory model framework to a traditional hidden Markov model recognition framework using an accent sensitive word corpus. System evaluation is performed using a corpus representing five English speaker groups with native American English, and English spoken with Mandarin Chinese, French, Thai, and Turkish accents for both male and female speakers.

[1]  John H. L. Hansen,et al.  Foreign accent classification using source generator based prosodic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[3]  Marc A. Zissman,et al.  Automatic language identification , 2001, Speech Commun..

[4]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  John H. L. Hansen,et al.  Language accent classification in American English , 1996, Speech Commun..

[6]  D. Jamieson,et al.  Intonation in English, French and German: Perception and Production , 1987 .

[7]  John H. L. Hansen,et al.  Speechfind: an experimental on-line spoken document retrieval system for historical audio archives , 2002, INTERSPEECH.

[8]  Elmar Nöth,et al.  Boiling down prosody for the classification of boundaries and accents in German and English , 2001, INTERSPEECH.

[9]  Hynek Hermansky,et al.  Towards decomposing the sources of variability in speech , 1997, EUROSPEECH.

[10]  Marc A. Zissman,et al.  Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Alex Waibel,et al.  Speaker, accent, and language identification using multilingual phone strings , 2002, Proceedings of the second international conference on Human Language Technology Research -.

[12]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[13]  P. Mermelstein,et al.  Effects of speaker accent on the performance of a speaker-independent, isolated-word recognizer , 1982 .

[14]  J. Flege Factors affecting degree of perceived foreign accent in English sentences. , 1988, The Journal of the Acoustical Society of America.

[15]  John H. L. Hansen,et al.  SPEECHFIND: spoken document retrieval for a national gallery of the spoken word , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[16]  Mitch Weintraub,et al.  Automatic text-independent pronunciation scoring of foreign language student speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  Stephen Cox,et al.  A comparison of two unsupervised approaches to accent identification , 1998, ICSLP.

[18]  James R. Glass,et al.  Statistical trajectory models for phonetic recognition , 1994, ICSLP.

[19]  Kuldip K. Paliwal,et al.  Model parameter estimation for mixture density polynomial segment models , 1998, Comput. Speech Lang..

[20]  Lawrence R. Rabiner,et al.  Speaker-independent isolated word recognition for a moderate size(54 word)vocabulary , 1979 .

[21]  John H. L. Hansen,et al.  Adaptive source generator compensation and enhancement for speech recognition in noisy stressful environments , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Yifan Gong,et al.  Stochastic trajectory modeling and sentence searching for continuous speech recognition , 1997, IEEE Trans. Speech Audio Process..

[23]  J. Flege The detection of French accent by American listeners. , 1984, The Journal of the Acoustical Society of America.

[24]  Herbert Gish,et al.  Parametric trajectory mixtures for LVCSR , 1998, ICSLP.

[25]  Mahesan Niranjan,et al.  Parametric subspace modeling of speech transitions , 1999, Speech Commun..

[26]  Julie McGory,et al.  Acquisition of dialectal differences in English by native Japanese speakers , 2001 .

[27]  R. W. King,et al.  Automatic accent classification of foreign accented Australian English speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[28]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[29]  Dirk Van Compernolle,et al.  Flemish accent identification based on formant and duration features , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  J. Hansen,et al.  A STUDY OF TEMPORAL FEATURES AND FREQUENCY CHARACTERISTICS IN AMERICAN ENGLISH FOREIGN ACCENT , 1997 .

[31]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[32]  R. W. King,et al.  Foreign speaker accent classification using phoneme-dependent accent discrimination models and comparisons with human perception benchmarks , 1997, EUROSPEECH.

[33]  R. W. King,et al.  Automatic accent classification using artificial neural networks , 1993, EUROSPEECH.

[34]  Kay M. Berkling SCoPE, syllable core and periphery evaluation: Automatic syllabification and foreign accent identification , 2001, Speech Commun..