论文信息 - Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data

Aero-acoustic modelling of voiced and unvoiced fricatives based on MRI data

We would like to develop a more realistic production model of unvoiced speech sounds, namely fricatives, plosives and aspiration noise. All three involve turbulence noise generation, with place-dependent source characteristics that vary with time (rapidly, in plosives). In this study, we aimed to produce, using an aero-acoustic model of the vocal-tract filter and source, voiced as well as unvoiced fricatives that provide a good match to analyses of speech recordings. The vocal-tract transfer function (VTTF) was computed by the vocal-tract acoustics program, VOAC [Davies, McGowan and Shadle. Vocal Fold Physiology: Frontiers in Basic Science, ed. Titze, Singular Pub., CA, 93-142, 1993], using geometrical data, in the form of cross-sectional area and hydraulic radius functions, along the length of the tract. VOAC incorporates the effects of net flow into the transmission of plane waves through a tubular representation of the tract, and relaxes assumptions of rrigid walls and isentropic propagation. The geometry functions were derived from multiple-slice, dynamic, magnetic resonance images (MRI) [Mohammad. PhD thesis, Dept. ECS, U. Southampton, UK, 1999; Shadle, Mohammad, Carter, and Jackson. Proc. ICPhS, S.F. CA, 1:623-626, 1999], using a method of converting from the pixel outlines that was improved over earlier efforts on vowels. A coloured noise source signal was combined with the VTTF and radiation characteristic to synthesize the unvoiced fricative [s]. For its voiced counterpart [z], many researchers have noted that the noise source appears to be modulated by voicing. Furthermore, the phase of the modulation has been shown to be perceptually significant. Based on our analysis [Jackson and Shadle. Proc. IEEE-ICASSP, Istanbul, 2000.] of recordings by the same subject, the frication source of [z] was varied periodically according to fluctuations in the flow velocity at the constriction exit, and the modulation phase was governed by the convection time for the flow perturbation to travel from the constriction to the obstacle. The synthesized fricatives were compared to the speech recordings in a simple listening test, and comparisons of the predicted and measured time series suggested that the model, which brings together physical, aerodynamic and acoustic information, can replicate characteristics of real speech, such as the modulation in voiced fricatives [http://www.isis.ecs.soton.ac.uk/research/ projects/nephthys/].

P.J.B. Jackson | C. H. Shadle