MODELLING THE GROWTH OF THE VOCAL TRACT VOWEL SPACES OF NEWLY-BORN INFANTS AND ADULTS CONSEQUENCES FOR ONTOGENESIS AND PHYLOGENESIS

If newly-born infants had the same sensorimotor control capabilities as adults, would their vocal tracts enable them to produce the same range of [i a u] vowel contrasts? Has increasing the volume of the pharynx by lowering the larynx been a necessary evolutionary phase for humans to produce speech? Using a new articulatory model for simulation of vocal tract growth, we calculated the shift and size of the vowel space with age. Our results show that: The maximal vowel space of newborn infants is potentially (at least) the same as adults; There is no reason to think that larynx lowering and increase in pharynx size have been guided by evolution towards speech. Our simulations allow the main flaws of Lieberman’s computations to be pinpointed and invalidate his well known thesis presented in 1971 and defended even now: a low larynx and a large pharynx considered as the anatomical basis for speech. 1. ANATOMICAL GROWTH DATA Systematic measurements of the vocal tract from birth to adulthood do not exist at present. However, it is possible to take advantage of cranio-facial measures established at different ages which have been published in anatomy, radiology, and paediatry. The evolution of the dimensions of the head (osteological structure) and the hyoid bone position (associated, to a certain extent, with the position of the larynx) permit the inference of broad tendencies in the development of the vocal tract. The doctoral thesis of Ursula Goldstein defended in 1980 [8] provides a veritable mine of information: an inventory of data corresponding to 14 distances and 3 angular measurements, established in relation to anatomical reference points and lines, for ages ranging from a few months to 20 years. All of these data can be closely fitted by (double) sigmoidal curves which characterise the general skeletal and muscular growth [2]. Here we summarise and draw attention to the points that are essential in understanding the phenomenon of vocal tract. At birth, the heads of infants are approximately hemispherical in shape. Increases in the volume and shape of the skull and of the size of the inferior maxilla modify the relative proportions of horizontal and vertical dimensions. The process does not therefore involve a simple uniform scaling, but rather an anamorphosis in which the vertical dimension is emphasised (Figure 1). For the vocal tract, this phenomenon is further accentuated by lowering of the larynx (inferred, in radiographs, from the position of the hyoid bone). The ratio between the horizontal dimension H (from the articulation of Bjork to the anterior nasal spine, including the thickness of the tissue covering the anterior surface of the superior maxilla beneath the nasal spine) and vertical dimension V (between the line connecting Figure 1. Newborn and adult: anamorphosis of the skull. Figure 2. Horizontal (H) and vertical (V) dimension allowing characterization of vocal tract growth. the sella turcica to the nasion and hyoid bone (Figure 2) varies from 1.6 to 3.0 between birth and adulthood respectively (for male subjects) [10]. Within the vocal tract, the growth of the pharynx is therefore approximately twice as large as that of the front cavity. page 2501 ICPhS99 San Francisco Regarding differences between sexes, the data presented by Goldstein [8] demonstrate that there are few differences before the age of 12 years. The growth of the female vocal tract is practically complete towards 15 years, whereas that of the male vocal tract has not yet finished: the male hyoid bone continues to descend, further increasing the length of the pharynx. One can therefore put forward the hypothesis that women would have the same pharynx size as men, if their period of growth were prolonged to span the same length of time. The differences observed between infants, women, and men can thus be considered on the whole to be the consequences of the same basic growth phenomenon. 2. MODELLING THE VOCAL TRACT GROWTH For adult speakers (male and female), numerous articulatory models of the vocal tract have been available for the past twenty years, established from cineradiographic data and derived from a statistical analysis guided by knowledge of the physiology of the articulators (cf. for example; [7, 15, 23]. These models have the advantage that they intrinsically take into account certain articulatory production constraints: the control parameters are directly interpretable in terms of the degrees of freedom of the articulators (protrusion and labial aperture; movement of the tongue body, dorsum, and tip; larynx height). However, models that possess these characteristics while also simulating growth from birth to adulthood are rare. Here we have used the VLAM growth model (Variable Linear Articulatory Model), developed by Maeda in 1994 [4] which integrates knowledge acquired from previous models with the growth data that are currently available. The growth process is introduced by modifying the longitudinal dimension of the vocal tract according to two scaling factors: one for the anterior part of the vocal tract and the other for the pharynx, interpolating the zone in-between: pharynx_scale = k(1.1 0.30) + 0.30 mouth_scale = k(1.0 0.65) + 0.65 The factor k permits the evolution of the vocal tract shape to be simulated, month by month and year by year: this was calibrated using the data provided by Goldstein [8]. The VLAM model was implemented and tested at ICP in an environment originally developed for the SMIP [3]. It is thus suitable for use in systematic simulation studies as well as for use in phonetics. 2. THE CONCEPT OF A MAXIMAL VOWEL SPACE The models generate a two-dimensional mid-sagittal section, as well as the corresponding area function (three-dimensional equivalent), from which it is possible to calculate the harmonic response (transfer function), formant frequencies (resonance maxima) [1, 9], and speech signal. This procedure is well-suited to modelling vowel production. If the entire input space of command parameters is explored — while satisfying the conditions necessary for vowel production — one can simulate the maximal F1-F2-F3 acoustic space appearing at the output. All possible oral vowels are thus situated within the limits of this region [5]. This kind of extended generation method allows possibilities for maximal distinctiveness to be described precisely, and permits an optimal choice of prototypical realisations. Such an approach can be shown to be more reliable than one that consists in extrapolating the limits of the vowel space for a particular vocal tract (or model) from three unique examples corresponding to [i a u], which are not guaranteed to be optimal (the flaw of Lieberman and Crelin [14]). 4. THE INFLUENCE OF VOCAL TRACT GROWTH ON THE MAXIMAL VOWEL SPACE 4.1. Detailed global predictions By using a schematic representation of the vocal tract limited to 2 or 4 tube sections (twin tubes) in conjunction with a lossless acoustic model, Mol [17] established in 1970 the basic general tendencies for this phenomenon. In the case of a simple linear transformation applied to the length L (kL with k=1 for an adult and k<1 for an infant), the maximal vowel space F1-F2 is scaled homogeneously by a factor of 1/k This global result agrees quite well with the well known data [20], established for adults (male and female) and adolescents for the vowels [i A u œ], for example, which define the limits of the American English vowel space (for more details [6, 19]). Using formant-cavity affiliation relationships, it is possible to predict the consequences of non-linear growth on the F1-F2 and F2-F3 spaces. By adjusting the prototypical area functions for [i a u] estimated from Maeda's articulatory model [15] using 2-tube and 4-tube models, it is possible to determine the following affiliations [16], and to predict the influence of the volume and length of any particular part of the vocal tract (the pharynx, as it happens) on vowel contrasts (Table 1). A linear length transformation with scale factor k results in a scaling by 1/k of the formants associated with resonance nλ, as is also the case for the Helmholtz resonances, on condition that the ratio between the neck area (Aneck lneck) and cavity area (Avol) is held constant. For the maximal vowel space delimited by [i a u] it can be predicted [4] that: • a relatively linear transformation of F1 and F2 will occur: F1 [i] and F1 [a] depending on the front cavity; by examining affiliations, F2 [i] and F2 [u] are found also to be affiliated, for infants, with the front cavity; • a contrast between F2 and F3 will occur, which is less pronounced for infants; these two formants become associated with front and back cavities of approximately equal length (between 3 and 4 cm). 4.2. Simulations with the VLAM model: the maximal vowel space and vowel prototypes Using VLAM, we generated a set of vowels for a grid of command parameters Pi (2 Pi + 2 in steps of 0.5), constraining the intra-oral constriction and lip area to be identical for adult and neonate (constriction area 0.3 cm 2 and lip area 0.15 cm ). Table 2 presents the limits of this space (in Hz, and in perceptual units, Bark and ERB [18]). It is evident that the perceptuo-acoustic space in the F1-F2 plane is at least as large, if not larger, for newly-born infants as it is for adults (Figure 4), and that in the F2-F3 plane it is clearly reduced (from 3.2 Bark or 4.3 ERB). From this maximal vowel space, we determined prototypes for the vowels [i a u], taking into account the positions proposed by Goldstein [8]. The VLAM model thus permitted us to calculate (1) the articulatory configurations that a new-born child would have if it were to display the same control capacities as an adult, (2) as well as the configurations for an adult, and for an adult having undergone a linear growth of the vocal tract, (3) with and (4) without length normalisation. The tendencies noted in the previous paragraph can be verifi