Modeling and synthesis of English regional accents with pitch and duration correlates

This paper provides an introduction to the acoustic-phonetic structure of English regional accents and presents a signal processing method for the modeling and transformation of the acoustic correlates of English accents for example from British English to American English. The focus of this paper is on the modeling of intonation and duration correlates of accents as the modeling of formants is described in previous papers (Yan et al., 2007; Vaseghi et al., 2009). The intonation correlates of accents are modeled with the statistics of a set of broad features of the pitch contour. The statistical models of phoneme durations and word speaking rates are obtained from automatic segmentation of word/phoneme boundaries of speech databases. A contribution of this paper is the use of accent synthesis for comparative evaluation of the causal effects of the acoustic correlates of accent. The differences between the acoustics-phonetic realizations of British Received Pronunciation (RP), Broad Australian (BAU) and General American (GenAm) English accents are modeled and used in an accent transformation and synthesis method for evaluation of the influence of formant, pitch and duration on conveying accents.

[1]  Roger K. Moore Computer Speech and Language , 1986 .

[2]  Qin Yan,et al.  Speech Accent Profiles: Modeling and Synthesis [Applications Corner] , 2009, IEEE Signal Processing Magazine.

[3]  Geoffrey Sampson,et al.  Corpus Linguistics: Readings in a Widening Discipline , 2004 .

[4]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[5]  R. van Bezooijen,et al.  Sociocultural Aspects of Pitch Differences between Japanese and Dutch Women , 1995, Language and speech.

[6]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[7]  Felicity Cox,et al.  The Acoustic Characteristics of /hVd/ Vowels in the Speech of some Australian Teenagers , 2006 .

[8]  Saeed Vaseghi,et al.  Speech Accent Profiles: Modeling and Synthesis , 2009 .

[9]  David B. Pisoni,et al.  Some acoustic cues for the perceptual categorization of American English regional dialects , 2004, J. Phonetics.

[10]  Keikichi Hirose,et al.  A new system for reliable pitch extraction of speech , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Paul Taylor,et al.  The tilt intonation model , 1998, ICSLP.

[12]  Qin Yan,et al.  A comparative analysis of UK and US English accents in recognition and synthesis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  V. Wolfe,et al.  Diphthong changes in style shifting from Southern English to Standard American English. , 2000, Journal of communication disorders.

[14]  E. Grabe,et al.  Intonational Variation in the British Isles , 2002 .

[15]  Merle Horne,et al.  Prosody: Theory and Experiment , 2000 .

[16]  Qin Yan,et al.  Analysis and Synthesis of Formant Spaces of British, Australian, and American Accents , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  A. Botinis,et al.  Intonation , 2001, Speech Commun..

[18]  Van Bezooijen,et al.  Sociocultural Aspects of Pitch Differences between Japanese and Dutch Women , 1995 .

[19]  Keikichi Hirose,et al.  A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Jonathan Harrington,et al.  An acoustic comparison between New Zealand and Australian English vowels , 1998 .

[21]  Daniel Hirst,et al.  Levels of Representation and Levels of Analysis for the Description of Intonation Systems , 2000 .

[22]  J. Harrington,et al.  An acoustic phonetic study of broad, general, and cultivated Australian English vowels* , 1997 .

[23]  John C. Wells,et al.  Accents of English , 1982 .

[24]  Shrikanth S. Narayanan,et al.  Refined speech segmentation for concatenative speech synthesis , 2002, INTERSPEECH.

[25]  Philip C. Woodland,et al.  Using accent-specific pronunciation modelling for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[26]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[27]  Nam Soo Kim,et al.  Spectral enhancement based on global soft decision , 2000, IEEE Signal Processing Letters.

[28]  Paul Taylor,et al.  The rise/fall/connection model of intonation , 1994, Speech Communication.