A model of acoustic interspeaker variability based on the concept of formant-cavity affiliation.

A method is proposed to model the interspeaker variability of formant patterns for oral vowels. It is assumed that this variability originates in the differences existing among speakers in the respective lengths of their front and back vocal-tract cavities. In order to characterize, from the spectral description of the acoustic speech signal, these vocal-tract differences between speakers, each formant is interpreted, according to the concept of formant-cavity affiliation, as a resonance of a specific vocal-tract cavity. Its frequency can thus be directly related to the corresponding cavity length, and a transformation model can be proposed from a speaker A to a speaker B on the basis of the frequency ratios of the formants corresponding to the same resonances. In order to minimize the number of sounds to be recorded for each speaker in order to carry out this speaker transformation, the frequency ratios are exactly computed only for the three extreme cardinal vowels [i, a, u] and they are approximated for the remaining vowels through an interpolation function. The method is evaluated through its capacity to transform the (F1,F2) formant patterns of eight oral vowels pronounced by five male speakers into the (F1,F2) patterns of the corresponding vowels generated by an articulatory model of the vocal tract. The resulting formant patterns are compared to those provided by normalization techniques published in the literature. The proposed method is found to be efficient, but a number of limitations are also observed and discussed. These limitations can be associated with the formant-cavity affiliation model itself or with a possible influence of speaker-specific vocal-tract geometry in the cross-sectional direction, which the model might not have taken into account.

[1]  K. Stevens Acoustic correlates of some phonetic categories. , 1979, The Journal of the Acoustical Society of America.

[2]  Per-Erik Nordström,et al.  Female and infant vocal tracts simulated from male area functions , 1977 .

[3]  Shinji Maeda,et al.  Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model , 1990 .

[4]  Gérard Bailly,et al.  Resonances as possible representation of speech in the auditory-to-articulatory transform , 1993, EUROSPEECH.

[5]  Christian Abry,et al.  "Laws" for lips , 1986, Speech Commun..

[6]  Kenneth N. Stevens,et al.  On the quantal nature of speech , 1972 .

[7]  Brad H. Story,et al.  A preliminary study of voice quality transformation based on modifications to the neutral vocal tract area function , 2002, J. Phonetics.

[8]  Brad H. Story,et al.  Considerations in voice transformation with physiologic scaling principles , 1997, Speech Commun..

[9]  Gérard Bailly,et al.  EVALUATION OF AN ARTICULATORY-ACOUSTIC MODEL BASED ON A REFERENCE SUBJECT , 1996 .

[10]  Gérard Bailly,et al.  Articulatori-acoustic vowel prototypes for speech production , 1995, EUROSPEECH.

[11]  Gunnar Fant,et al.  A note on vocal tract size factors and non-uniform f-pattern scalings , 1966 .

[12]  Pascal Perrier,et al.  The geometric vocal tract variables controlled for vowel production: proposals for constraining acoustic-to-articulatory inversion , 1992 .

[13]  Kenneth N. Stevens,et al.  The Chiba and Kajiyama Book as a Precursor to the Acoustic Theory of Speech Production( Sixtieth Anniversary of the Publication of The Vowel, Its Nature and Structure by Chiba and Kajiyama) , 2001 .

[14]  H. K. Dunn The Calculation of Vowel Resonances, and an Electrical Vocal Tract , 1950 .

[15]  L. Gerstman Classification of self-normalized vowels , 1968 .

[16]  Pascal Perrier,et al.  Compensation strategies for the perturbation of the rounded vowel [u] using a lip-tube : A study of the control space in speech production , 1995 .

[17]  J. Perkell,et al.  Invariance and variability in speech processes , 1987 .

[18]  M. Mrayati,et al.  Relations entre la forme du conduit vocal et les caractéristiques acoustiques des voyelles françaises , 1976 .

[19]  Christian Abry,et al.  Vocalic nomograms: Acoustic and articulatory considerations upon formant convergences , 1990 .

[20]  H. Wakita Normalization of vowels by vocal-tract length and its application to vowel identification , 1977 .

[21]  J. D. Miller,et al.  Auditory-perceptual interpretation of the vowel. , 1989, The Journal of the Acoustical Society of America.

[22]  Terrance M. Nearey,et al.  On the physical interpretation of vowel quality: cinefluorographic and acoustic evidence , 1980 .

[23]  S. F. Disner Evaluation of vowel normalization procedures. , 1980, The Journal of the Acoustical Society of America.

[24]  B. Lobanov Classification of Russian Vowels Spoken by Different Speakers , 1971 .

[25]  T. M. Nearey Phonetic feature systems for vowels , 1978 .

[26]  Maria-Gabriella Di Benedetto,et al.  Extrinsic normalization of vowel formant values based on cardinal vowels mapping , 1992, ICSLP.